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HOMAN DNA MISMATCH REPAIR PROTEINS 



This invention relates to newly identified 
polynucleotides, polypeptides encoded by such 
polynucleotides, the use of such polynucleotides and 
polypeptides, as well as the production of such 
polynucleotides and polypeptides. More particularly, the 
polypeptides of the present invention are human homologs of 
the prokaryotic mutL4 gene and are hereinafter referred to as 
hMLHl , hMLH2 and hMLH3 . 

In both prolaryotes and eukaryotes, the DNA mismatch 
repair gene plays a prominent role in the correction of 
errors made during DNA replication and genetic recombination. 
The E.coli methyl -directed DNA mismatch repair system is the 
best understood DNA mismatch repair system to date. In 
E,coli, this repair pathway involves the products of the 
mutator genes mutS, wutL, mutH, and uvrD, Mutants of any one 
of these genes will reveal a mutator phenotype . MutS is a 
DNA mismatch-binding protein which initiates this repair 
process, uvrD is a DNA helicase and MutH is a latent 



endonuclease that incises at the unmethylated strands of a 
hemi -methylated GATC sequence. MutL protein is believed to 
recognize and bind to the mismatch-DNA-MutS-MutH complex to 
enhance the endonuclease activity of MutH protein. After the 
unmethylated DNA strand is cut by the MutH, single-stranded 
DNA-binding protein, DNA polymerase III, exonuclease I and < 
DNA ligase are required to complete this repair process 
{Modrich P., Annu. Rev. Genetics, 25:229-53 (1991)). ^ 

Elements of the E.coli MutLHS system appears to be 
conserved during evolution in prokaryotes and eukaryotes . 
Genetic study analysis suggests that Saccharomyces cerevisiae 
has a mismatch repair system similar to the bacterial MutLHS 
system. In S. cerevisiae, at least two MutL homologs, PMSl 
and MLHl. have been reported. Mutation of either one of them 
leads to a mitotic mutator phenotype (Prolla et al, Mol . 
Cell. Biol. 14:407-415 (1994)). At least three MutS homologs 
have been found in S. cerevisiae, namely MSHl , MSH2 , and MSH3 . 
Disruption of the MSH2 gene affects nuclear mutation rates. 
Mutants in S. cerevisae, MSH2, PMSl, and MLHl have been found 
to exhibit increased rates of expansion and contraction of 
dinucleotide repeat sequences (Strand et al . , Nature, 
365 :274-276 (1993) ) . 

It has been reported that a number of human tumors such 
as lung cancer, prostate cancer, ovarian cancer, breast 
cancer, colon cancer and stomach cancer show instability of 
repeated DNA sequences (Han et al.. Cancer, 53:5087-5089 
(1993); Thibodeau et al . , Science 260:816-819 (1993); 
Risinger et al . , Cancer 53:5100-5103 (1993)). This 
phenomenon suggests that lack of the DNA mismatch repair is 
probably the cause of these tumors . 

Little was known about the DNA mismatch repair system in 
humans until recently, the human homolog of the MutS gene was 
cloned and found to be responsible for hereditary 
nonpolyposis colon cancer (HNPCC) , (Fishel et al . , Cell, 
75:1027-1038 (1993) and Leach et al., Cell, 75:1215-1225 
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(1993)). HNPCC was first linked to a locus at chromosome 
2pi6 which causes dinucleotide instability. it was then 
demonstrated that a DNA mismatch repair protein {MutS) 
homolog was located at this locus, and that C-->T 
transitional mutations at several conserved regions were 
specifically observed in HNPCC patients. Hereditary 
nonpolyposis colorectal cancer is one of the most common 
hereditable diseases of man, affecting as many as one in two 
hundred individuals in the western world. 

It has been demonstrated that hereditary colon cancer 
can result from mutations in several loci. Familial 
adenomatosis polyposis coli (APC) , linked to a gene on 
chromosome 5, is responsible for a small minority of 
hereditary colon cancer. Hereditary colon cancer is also 
associated with Gardner's syndrome, Turcot 's syndrome. Peutz- 
Jaeghers syndrome and juvenile polyposis coli. In addition, 
hereditary nonpolyposis colon cancer may be involved in 5% of 
all human colon cancer. All of the different types of 
familial colon cancer have been shown to be transmitted by a 
dominant autosomal mode of inheritance, 

in addition to localization of HNPCC, to the short arm 
of chromosome 2, a second locus has been linked to a pre- 
disposition to HNPCC (Lindholm, et al . , Nature Genetics, 
5:279-282 (1993)) . A Strong linkage was demonstrated between 
a polymorphic marker on the short arm of chromosome 3 and the 
disease locus . 

This finding suggests that mutations on various DNA 
mismatch repair proteins probably play crucial roles in the 
development of human hereditary diseases and cancers. 

HNPCC is characterized clinically by an apparent 
autosomal dominantly inherited predisposition to cancer of 
the colon, endometrium and other organs. (Lynch, H.T. et 
al., n;.si^roenterolocTv . 104:1535-1549 (1993)). The 
identification of markers at 2pl6 and 3p2l-22 which were 
linked to disease in selected HNPCC kindred unequivocally 
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established its mendelian nature (Peltomaki, P. et al . , 
Science . 260:810-812 (1993)). Tumors from HNPCC patients are 
characterized by widespread alterations of simple repeated 
sequences (microsatellites) (Aaltonen, L.A., et al . , Science. 
260:812-816 (1993)}. This type of genetic instability was 
originally observed in a subset (12 to 18% of sporadic 
colorectal cancers ( Id. ) , Studies in bacteria and yeast 
indicated that a defect in DNA mismatch repair genes can 
result in a similar instability of microsatellites (Levinson, 
G. and Gutman, G.A., Nuc . Acids Res . . 15:5325-5338 (1987)), 
and it was hypothesized that deficiency in mismatched repair 
was responsible for HNPCC (Strand, M. et al . , Nature, 
365:274-276 (1993)). Analysis of extracts from HNPCC tumor 
cell lines showed mismatch repair was indeed deficient, 
adding definitive support to this conjecture (Parsons, R.P., 
et al., Cell, 75:1227-1236 (1993)). As not all HNPCC kindred 
can be linked to the same loci, and as at least three genes 
can produce a similar phenotype in yeast, it seems likely 
that other mismatch repair genes could play a role in some 
cases of HNPCC. 

hMLHl is most homologous to the yeast mutL-homolog yMLHl 
while hMLH2 and hMLH3 have greater homology to the yeast 
mutL-homolog yPMSl (hMLH2 and hMLH3 due to their homology to 
yeast PMSl gene are sometimes referred to in the literature 
as hPMSl and hPMS2) . In addition to hMLHl, both the hMLH2 
gene on chromosome 2q32 and the hMLH3 gene, on chromosome 
7p22, were found to be mutated in the germ line of HNPCC 
patients. This doubles the number of genes implicated in 
HNPCC and may help explain the relatively high incidence of 
this disease. 

In accordance with one aspect of the present invention, 
there are provided novel putative mature polypeptides which 
are hMLHl, hMLH2 and hMLH3 , as well as biologically active 
and diagnostically or therapeutically useful fragments. 



analogs and derivatives thereof . The polypeptides of the 
present invention are of human origin. 

In accordance with another aspect of the present 
invention, there are provided isolated nucleic acid molecules 
encoding such polypeptides, including mRNAs, DNAs , cDNAs , 
genomic DNA as well as biologically active and diagnostically 
or therapeutically useful fragments, analogs and derivatives 
thereof . 

In accordance with still smother aspect of the present 
invention there are provided nucleic acid probes comprising 
nucleic acid molecules of sufficient length to specifically 
hybridize to hMLHl , hMIiH2 and hMLH3 sequences. 

In accordance with yet a further aspect of the present 
invention, there is provided a process for producing such 
polypeptides by recombinant techniques which comprises 
culturing recombinant proJcaryotic and/or eukaryotic host 
cells, containing an hMLHl, hMLH2 or hMLH3 nucleic acid 
sequence, under conditions promoting expression of said 
protein and subsequent recovery of said proteins . 

In accordance with yet a further aspect of the present 
invention, there is provided a process for utilizing such 
polypeptide, or polynucleotide encoding such polypeptide, for 
therapeutic purposes, for example, for the treatment of 
cancers . 

In accordance with another aspect of the present 
invention there is provided a method of diagnosing a disease 
or a susceptibility to a disease related to a mutation in the 
hMLHl, hMLH2 or hMLH3 nucleic acid sequences and the proteins 
encoded by such nucleic acid sequences. 

In accordance with yet a further aspect of the present 
invention, there is provided a process for utilizing such 
polypeptides, or polynucleotides encoding such polypeptides, 
for In vitro purposes related to scientific research, 
synthesis of DNA and manufacture of DNA vectors. 
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These and other aspects of the present invention should 
be apparent to those skilled in the art from the teachings 
herein. 

The following drawings are illustrative of embodiments 
of the invention and are not meant to limit the scope of the 
invention as encompassed by the claims. 

Figure 1 illustrates the cDNA sequence and corresponding 
deduced amino acid sequence for the human DNA repair protein 
hMLHl- The amino acids are represented by their standard 
one -letter abbreviations. Sequencing was performed using a 
373 Automated DNA sequencer (Applied Biosystems, Inc.). 
Sequencing accuracy is predicted to be greater than 97% 
accurate . 

Figure 2 illustrates the cDNA sequence and corresponding 
deduced amino acid sequence of hMIiH2 . The amino acids are 
represented by their standard one-letter abbreviations. 

Figure 3 illustrates the cDNA sequence and corresponding 
deduced amino acid sequence of hMLH3 . The amino acids are 
represented by their standard one-letter abbreviations. 

Figure 4 . Alignment of the predicted amino acid 
sequences of S. cerevisiae PMSl (yPMSl) , with the hMLH2 and 
hMLH3 amino acid sequences using MACAW (version 1.0) program. 
Amino acid in conserved blocks are capitalized and shaded on 
the mean of their pair-wise scores. 

Figure 5 . Mutational analysis of hMLH2 . (A) IVSP 
analysis and mapping of the transcriptional stop mutation in 
HNPCC patient CW. Translation of codons i to 369 (lane 1) , 
codons 1 to 290 (lane 2). and codons 1 to 214 (lane 3). CW 
is translated from the cDNA of patient CW, while NOR was 
translated from the cDNA of a normal individual. The 
arrowheads indicate the truncated polypeptide due to the 
potential stop mutation. The arrows indicate molecular 
weight markers in kilodaltons . (B) Sequence analysis of CW 
indicates a C to T transition at codon 233 (indicated by the 
arrow) . Lanes 1 and 3 are sequence derived from control 



patients; lane 2 ie sequence derived from genomic DNA of CW. 
The ddA mixes from each sequencing mix were loaded in 
adjacent lanes to facilitate comparison as were those for 
ddC, ddD, and ddT mixes. 

Figure 6. Mutational analysis of hMLH3 . (A) IVSP 
analysis of hMLH3 from patient GC. Lane GC is from 
fibroblasts of individual GC; lane GCx is from the tumor of 
patient GC; lanes NORl and 2 are from normal control 
individuals. FL indicates full-length protein, and the 
arrowheads indicate the germ line truncated polypeptide. The 
arrows indicate molecular weight markers in kilodaltons (B) 
PGR analysis of DNA from a patient GC shows that the lesion 
in present in both hMLH3 alleles in tumor cells. 
Amplification was done using primers that amplify 5', 3', or 
within (MID) the region deleted in the cDNA. Lane 1, DNA 
derived from fibroblasts of patient GC; lane 2, DNA derived 
from tumor of patient GC; lane 3, DNA derived from a normal 
control patient; lane 4, reactions without DNA template. 
Arrows indicate molecular weight in base pairs . 

In accordance with an aspect of the present invention, 
there are provided isolated nucleic acids (polynucleotides) 
which encode for the mature polypeptides having the deduced 
amino acid sequence of Figures 1, 2 and 3 (SEQ ID No. 2, 4 
and 6) or for the mature polypeptides encoded by the cDNA of 
the clone deposited as ATCC Deposit No. 75649, 75651, 75650, 
deposited on January 25, 1994. 

ATCC Deposit No. 75649 is a cDNA clone which contains 
the full length sequence encoding the human DNA repair 
protein referred to herein as hMLHl; ATCC Deposit No. 75651 
is a CDNA clone containing the full length cDNA sequence 
encoding the human DNA repair protein referred to herein as 
hMLH2; ATCC Deposit NO. 75650 is a CDNA clone containing the 
full length DNA sequence referred to herein as hMLH3 . 

Polynucleotides encoding the polypeptides of the present 
invention may be obtained from one or more libraries prepared 



from heart, lung, prostate, spleen, liver, gallbladder, fetal 
brain and testes tissues. The polynucleotides of hMLHi were 
discovered from a human gallbladder cDNA library. In 
addition, six cDNA clones which are identical to the hMLHl at 
the N- terminal ends were obtained from human cerebellum, 
eight-week embryo, fetal heart, HSC172 cells and Jurket cell 
CDNA libraries. The hMLHl gene contains an open reading 
frame of 756 amino acids encoding for an 85kD protein which 
exhibits homology to the bacterial and yeast mutL proteins. 
However, the 5' non-translated region was obtained from the 
CDNA clone obtained from the fetal heart for the purpose of 
extending the non-translated region to design the 
oligonucleotides . 

The hMLH2 gene was derived from a human T-cell lymphoma 
CDNA library. The hMLH2 cDNA clone identified an open 
reading frame of 2,796 base pairs flanked on both sides by 
in-frame termination codons . It is structurally related to 
the yeast PMSl family. It contains an open reading frame 
encoding a protein of 934 amino acid residues. The protein 
exhibits the highest degree of homology to yeast PMSl with 
2 7% identity and 82 % similarity over the entire protein. 

A second region of significant homology among the three 
PMS related proteins is in the carboxyl terminus, between 
codons 800 to 900. This region shares a 22% and 47% homology 
between yeast PMSl protein and hMLH2 and hMLH3 proteins, 
respectively, while very little homology of this region was 
observed between these proteins, and the other yeast mutL 
homolog, yMLHl . 

The hMLH3 gene was derived from a human endometrial 
tumor CDNA library. The hMLH3 clone identified a 2,586 base 
pair open reading frame. It is structurally related to the 
yPMS2 protein family. It contains an open reading frame 
encoding a protein of 862 amino acid residues. The protein 
exhibits the highest degree of homology to yPMS2 with 32% 
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identity and 66% similarity over the entire amino acid 
sequence . 

It is significant with respect to a putative 
identification of hMLHl , hMLH2 and hMLH3 that the GFRGEAL 
domain which is conserved in muth homologs derived from £, 
coli is conserved in the amino acid sequences of , hMLHi, 
hMLH2 and hMLH3 . 

The polynucleotides of the present invention may be in 
the form of RNA or in the form of DNA, which DNA includes 
cDNA, genomic DNA, and synthetic DNA, The DNA may be double- 
stranded or single -stranded, and if single stranded may be 
the coding strand or non-coding (anti -sense) strand. The 
coding sequence which encodes the mature polypeptide may be 
identical to the coding sequence shown in Figures i, 2 and 3 
(SEQ ID No. 1) or that of the deposited clone or may be a 
different coding sequence which coding sequence, as a result 
of the redundancy or degeneracy of the genetic code, encodes 
the same mature polypeptides as the DNA of Figures l, 2 and 
3 (SEQ ID No. 2, 4 and 6) or the deposited cDNA(s) . 

The polynucleotides which encode for the mature 
polypeptides of Figures l, 2 and 3 (SEQ ID No. 2, 4 and 6) or 
for the mature polypeptides encoded by the deposited cDNAs 
may include: only the coding sequence for the mature 
polypeptide; the coding sequence for the mature polypeptide 
(and optionally additional coding sequence) and non-coding 
sequence, such as introns or non-coding sequence 5' and/or 3' 
of the coding sequence for the mature polypeptide. 

Thus, the term "polynucleotide encoding a polypeptide" 
encompasses a polynucleotide which includes only coding 
sequence for the polypeptide as well as a polynucleotide 
which includes additional coding and/or non-coding sequence. 

The present invention further relates to variants of the 
hereinabove described polynucleotides which encode for 
fragments, analogs and derivatives of the polypeptides having 
the deduced amino acid sequences of Figures 1, 2 and 3 (SEQ 
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ID No. 2, 4 and 6) or the polypeptides encoded by the cDNA of 
the deposited clones. The variants of the polynucleotides 
may be a naturally occurring allelic variant of the 
polynucleotides or a non-naturally occurring variant of the 
polynucleotides . 

Thus, the present invention includes polynucleotides 
encoding the same mature polypeptides as shown in Figures l, 
2 and 3 (SEQ ID No . 2, 4 and 6) or the same mature 
polypeptides encoded by the cDNA of the deposited clones as 
well as variants of such polynucleotides which variants 
encode for a fragment, derivative or analog of the 
polypeptides of Figures 1, 2 and 3 (SEQ ID No. 2, 4 and 6) or 
the polypeptides encoded by the cDNA of the deposited clones. 
Such nucleotide variants include deletion variants, 
substitution variants and addition or insertion variants. 

As hereinabove indicated, the polynucleotides may have 
a coding sequence which is a naturally occurring allelic 
variant of the coding sequence shown in Figures 1, 2 and 3 
(SEQ ID No. 1, 3 and 5) or of the coding sequence of the 
deposited clones. As known in the art, an allelic variant is 
an alternate form of a polynucleotide sequence which may have 
a substitution, deletion or addition of one or more 
nucleotides, which does not substantially alter the function 
of the encoded polypeptide. 

The polynucleotides of the present invention may also 
have the coding sequence fused in frame to a marker sequence 
which allows for purification of the polypeptides of the 
present invention. The marker sequence may be, for example, 
a hexa-histidine tag supplied by a pQE-9 vector to provide 
for purification of the mature polypeptides fused to the 
marker in the case of a bacterial host, or, for example, the 
marker sequence may be a hemagglutinin (HA) tag when a 
mammalian host, e.g. COS -7 cells, is used. The HA tag 
corresponds to an epitope derived from the influenza 
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hemagglutinin protein (Wilson, I., et al . , Cell, 37:767 
(1984) ) . 

The present invention further relates to 
polynucleotides which hybridize to the hereinabove -described 
sequences if there is at least 50% and preferably 70% 
identity between the sequences. The present invention 
particularly relates to polynucleotides which hybridize under 
stringent conditions to the hereinabove -described 
polynucleotides. As herein used, the term "stringent 
conditions" means hybridization will occur only if there is 
at least 95% and preferably at least 97% identity between the 
secfuences . The polynucleotides which hybridize to the 
hereinabove described polynucleotides in a preferred 
embodiment encode polypeptides which retain substantially the 
same biological function or activity as the mature 
polypeptides encoded by the cDNA of Figures 1, 2 and 3 (SEQ 
ID No. 1, 3 and 5) or the deposited cDNA(s) . 

The deposit (s) referred to herein will be maintained 
under the terms of the Budapest Treaty on the International 
Recognition of the Deposit of Micro-organisms for purposes of 
Patent Procedure. These deposits are provided merely as 
convenience to those of skill in the art and are not an 
admission that a deposit is required under 35 U.S.C. §112. 
The sequence of the polynucleotides contained in the 
deposited materials, as well as the amino acid sequence of 
the polypeptides encoded thereby, are incorporated herein by 
reference and are controlling in the event of any conflict 
with any description of sequences herein. A license may be 
required to make, use or sell the deposited materials, and 
no such license is hereby granted. 

The present invention further relates to polypeptides 
which have the deduced amino acid sequence of Figures 1, 2 
and 3 (SEQ ID No. 2, 4 and 6) or which have the amino acid 
sequence encoded by the deposited cDNA(s), as well as 
fragments , analogs and derivatives of such polypeptides . 
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The terms "fragment," "derivative" and "analog" when 
referring to the polypeptides of Figures l, 2 and 3 (SEQ ID 
NO. 2, 4 and 6) or that encoded by the deposited cDNA(s), 
means polypeptides which retain essentially the same 
biological function or activity as such polypeptides. Thus, 
an analog includes a proprotein which can be activated by 
cleavage of the proprotein portion to produce an active 
mature polypeptide. 

The polypeptides of the present invention may be a 
recombinant polypeptide, a natural polypeptide or a synthetic 
polypeptide, preferably a recombinant polypeptide. 

The fragment, derivative or analog of the polypeptides 
of Figures 1, 2 and 3 (SEQ ID No. 2, 4 and 6) or that encoded 
by the deposited cDNAs may be (i) one in which one or more of 
the amino acid residues are substituted with a conserved or 
non-conserved amino acid residue (preferably a conserved 
amino acid residue) and such substituted amino acid residue 
may or may not be one encoded by the genetic code, or (ii) 
one in which one or more of the amino acid residues includes 
a substituent group, or (iii) one in which the mature 
polypeptide is fused with another compound, such as a 
compound to increase the half-life of the polypeptide (for 
example, polyethylene glycol). Such fragments, derivatives 
and analogs are deemed to be within the scope of those 
skilled in the art from the teachings herein. 

The polypeptides and polynucleotides of the present 
invention are preferably provided in an isolated form, and 
preferably are purified to homogeneity. 

The term "isolated" means that the material is removed 
from its original environment (e.g., the natural environment 
if it is naturally occurring) . For example, a naturally- 
occurring polynucleotide or polypeptide present in a living 
animal is not isolated, but the same polynucleotide or 
polypeptide, separated from some or all of the co-existing 
materials in the natural system, is isolated. Such 
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polynucleotides could be part of a vector and/or such 
polynucleotides or polypeptides could be part of a 
composition, and still be isolated in that such vector or 
composition is not part of its natural environment. 

The present invention also relates to vectors which 
include polynucleotides of the present invention, host cells 
which are genetically engineered with vectors of the 
invention amd the production of polypeptides of the invention 
by recombinant techniques . 

Host cells are genetically engineered (transduced or 
transformed or transfected) with the vectors of this 
invention which may be. for example, a cloning vector or an 
expression vector. The vector may be, for example, in the 
form of a plasmid, a viral particle, a phage, etc. The 
engineered host cells can be cultured in conventional 
nutrient media modified as appropriate for activating 
promoters, selecting transf ormants or amplifying the hMLHl, 
hMLH2 and hMLH3 genes. The culture conditions, such as 
temperature, pH and the like, are those previously used with 
the host cell selected for expression, and will be apparent 
to the ordinarily skilled artisan. 

The polynucleotides of the present invention may be 
employed for producing polypeptides by recombinant 
techniques. Thus, for example, the polynucleotide may be 
included in any one of a variety of expression vectors for 
expressing a polypeptide. Such vectors include chromosomal, 
nonchromosomal and synthetic DNA sequences, e.g., 

derivatives of SV40; bacterial plasmids; phage DNA; 
baculovirus; yeast plasmids; vectors derived from 
combinations of plasmids and phage DNA, viral DNA such as 
vaccinia, adenovirus, fowl pox virus, and pseudorabies . 
However, any other vector may be used as long as it is 
replicable and viable in the host. 

The appropriate DNA sequence may be inserted into the 
vector by a variety of procedures. In general, the DNA 
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sequence is inserted into an appropriate restriction 
endonuclease site(s) by procedures known in the art. Such 
procedures and others are deemed to be within the scope of 
those skilled in the art. 

The DNA sequence in the expression vector is operatively 
linked to an appropriate expression control sequence (s) 
(promoter) to direct mRNA synthesis. As representative 
examples of such promoters, there may be mentioned: LTR or 
SV4 0 promoter, the E. coli . lac or trp . the phage lambda Pj 
promoter and other promoters known to control expression of 
genes in prokaryotic or eukaryotic cells or their viruses. 
The expression vector also contains a ribosome binding site 
for translation initiation and a transcription terminator. 
The vector may also include appropriate sequences for 
amplifying expression. 

In addition, the expression vectors preferably contain 
one or more selectable marker genes to provide a phenotypic 
trait for selection of transformed host cells such as 
dihydrof olate reductase or neomycin resistance for eukaryotic 
cell culture, or such as tetracycline or ampicillin 
resistance in E. coli . 

The vector containing the appropriate DNA sequence as 
hereinabove described, as well as an appropriate promoter or 
control sequence, may be employed to transform an appropriate 
host to permit the host to express the proteins. 

As representative examples of appropriate hosts, there 
may be mentioned: bacterial cells, such as E. coli. 
streotomvces , Salmonella tvphimurium ; fungal cells, such as 
yeast; insect cells such as Drosophila S2 and Spodoptera Sf 9 ; 
animal cells such as CHO, COS or Bowes melanoma; 
adenoviruses; plant cells, etc. The selection of an 
appropriate host is deemed to be within the scope of those 
skilled in the art from the teachings herein. 

More particularly, the present invention also includes 
recombinant constructs comprising one or more of the 
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sequences as broadly described a^tove. The constructs 
comprise a vector, such as a plasmid or viral vector, into 
which a sequence of the invention has been inserted, in a 
forward or reverse orientation. In a preferred aspect of 
this embodiment, the construct further comprises regulatory 
sequences, including, for example, a promoter, operably 
linked to the sequence. Large numbers of suitable vectors 
and promoters are known to those of skill in the art, and are 
commercially available. The following vectors are provided 
by way of example. Bacterial: pQEVO, pQE60, pOE-9 (Qiagen, 
Inc.), pbs, pDlO, phagescript, psiX174, pbluescript SK, 
pbsks, pNH8A, pNHlGa, pNHlSA, pNH46A (Stratagene) ; ptrc99a, 
pKK223-3, PKK233-3, pDR540, pRlT5 (Pharmacia). Eukaryotic: 
pWiiNEO, pSV2CAT, pOG44 , pXTl , pSG (Stratagene) pSVK3 , pBPV, 
pMSG, pSVL (Pharmacia) . However, any other plasmid or vector 
may be used as long as they are replicable and viable in the 
host . 

Promoter regions can be selected from any desired gene 
using CAT (chloramphenicol transferase) vectors or other 
vectors with selectable markers. Two appropriate vectors are 
pKK232-8 and pCW7 . Particular named bacterial promoters 
include lad, lacZ, T3 , T7 , gpt, lambda Pr, P,. and TRP. 
Eukaryotic promoters include C!MV immediate early, HSV 
thymidine kinase, early and late SV40, LTRs from retrovirus, 
and mouse metallothionein-I . selection of the appropriate 
vector and promoter is well within the level of ordinary 

skill in the art. 

In a further embodiment, the present invention relates 
to host cells containing the above -described constructs. The 
host cell can be a higher eukaryotic cell, such as a 
mammalian cell, or a lower eukaryotic cell, such as a yeast 
cell, or the host cell can be a prokaryotic cell, such as a 
bacterial cell. Introduction of the construct into the host 
cell can be effected by calcium phosphate transf ection, DEAE- 
Dextran mediated transf ection, or electroporation (Davis, L., 
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Dibner, M- , Battey, I., Basic Methods in Molecular Biology, 
(1986) ) . 

The constructs in host cells can be used in a 
conventional manner to produce the gene product encoded by 
the recombinant sequence. Alternatively, the polypeptides of 
the invention can be synthetically produced by conventional 
peptide synthesizers . 

Mature proteins can be expressed in mammalian cells, 
yeast, bacteria, or other cells under the control of 
appropriate promoters. Cell-free translation systems can 
also be employed to produce such proteins using RNAs derived 
from the DNA constructs of the present invention. 
Appropriate cloning and expression vectors for use with 
prokaryotic and eukaryotic hosts are described by Sambrook, 
et al., Molecular Cloning: A Laboratory Manual, Second 
Edition, cold Spring Harbor, N.Y., (1989), the disclosure of 
which is hereby incorporated by reference. 

Transcription of the DNA encoding the polypeptides of 
the present invention by higher eukaryotes is increased by 
inserting an enhancer sequence into the vector. Enhancers 
are cis-acting elements of DNA, usually about from 10 to 300 
bp that act on a promoter to increase its transcription. 
Examples including the SV4 0 enhancer on the late side of the 
replication origin bp 100 to 270, a cytomegalovirus early 
promoter enhancer, the polyoma enhancer on the late side of 
the replication origin, and adenovirus enhancers. 

Generally, recombinant expression vectors will include 
origins of replication and selectable markers permitting 
transformation of the host cell, e.g., the ampicillin 
resistance gene of E. coli and S . cerevisiae TRPl gene, and 
a promoter derived from a highly-expressed gene to direct 
transcription of a downstream structural sequence. Such 
promoters can be derived from operons encoding glycolytic 
enzymes such as 3 -phosphoglycerate kinase (PGK) , a-f actor, 
acid phosphatase, or heat shock proteins, among others. The 
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heterologous structural sequence is assembled in appropriate 
phase with translation initiation and termination sequences. 
Optionally, the heterologous sequence can encode a fusion 
protein including an N-terminal identification peptide 
imparting desired characteristics, e.g., stabilization or 
simplified purification of expressed recombinant product. 

Useful expression vectors for bacterial use are 
constructed by inserting a structural DNA sequence encoding 
a desired protein together with suitable translation 
initiation and termination signals in operable reading phase 
with a functional promoter. The vector will comprise one or 
more phenotypic selectable markers and an origin of 
replication to ensure maintenance of the vector and to, if 
desirable, provide amplification within the host. Suitable 

prokaryotic hosts for transformation include E, coli, 

Bacillus subtilis . Salmonell a tvphimurium and various species 
within the genera Pseudomonas, Streptomyces , and 
Staphylococcus, although others may also be employed as a 
matter of choice. 

AS a representative but nonlimiting example, useful 
expression vectors for bacterial use can comprise a 
selectable marker and bacterial origin of replication derived 
from commercially available plasmids comprising genetic 
elements of the well known cloning vector pBR322 (ATCC 
37017) . Such commercial vectors include, for example, 
PKK223-3 (Pharmacia Fine Chemicals, Uppsala, Sweden) and GEMl 
(Promega Biotec, Madison, Wl , USA). These pBR322 "backbone" 
sections are combined with an appropriate promoter and the 
structural sequence to be expressed. 

Following transformation of a suitable host strain and 
growth of the host strain to an appropriate cell density, the 
selected promoter is induced by appropriate means (e.g., 
temperature shift or chemical induction) and cells are 
cultured for an additional period. 
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Cells are typically harvested by centrif ugation, 
disrupted by physical or chemical means, and the resulting 
crude extract retained for further purification. 

Microbial cells employed in expression of proteins can 
be disrupted by any convenient method, including freeze -thaw 
cycling, sonication, mechanical disruption, or use of cell 
lysing agents, such methods are well know to those skilled in 
the art . 

Various mammalian cell culture systems can also be 
employed to express recombinant protein. Examples of 
mammalian expression systems include the COS -7 lines of 
monkey kidney fibroblasts, described by Gluzman, Cell, 23:175 
(1981) , and other cell lines capable of expressing a 
compatible vector, for example, the C12 7, 3T3 , CHO, HeLa and 
BHK cell lines. Mammalian expression vectors will comprise 
an origin of replication, a suitable promoter and enhancer, 
and also any necessary ribosome binding sites, 
polyadenylation site, splice donor and acceptor sites, 
transcriptional termination sequences, and 5' flanking 
nontranscribed sequences . DNA sequences derived from the 
SV4 0 splice, and polyadenylation sites may be used to provide 
the required nontranscribed genetic elements. 

The polypeptides can be recovered and purified from 
recombinant cell cultures by methods including ammonium 
sulfate or ethanol precipitation, acid extraction, anion or 
cation exchange chromatography, phosphocellulose 
chromatography, hydrophobic interaction chromatography, 
affinity chromatography, hydroxylapatite chromatography and 
lectin chromatography. Protein refolding steps can be used, 
as necessary, in completing configuration of the mature 
protein. Finally, high perforroance liquid chromatography 
(HPLC) can be employed for final purification steps. 

The polypeptides of the present invention may be a 
naturally purified product, or a product of chemical 
synthetic procedures, or produced by recombinant techniques 
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from a prokaryotic or eukaryotic host (for example, by 
bacterial; yeast, higher plant, insect and mammalian cells in 
culture) . Depending upon the host employed in a recombinant 
production procedure, the polypeptides of the present 
invention may be glycosylated or may be non-glycosylated . 

In accordance with a further aspect of the invention, 
there is provided a process for determining susceptibility to 
cancer, in particular, a hereditary cancer. Thus, a mutation 
in a human repair protein, which is a human homolog of mutL, 
and in particular those described herein, indicates a 
susceptibility to cancer, and the nucleic acid sequences 
encoding such human homologs may be employed in an assay for 
ascertaining such susceptibility. Thus, for example, the 
assay may be employed to determine a mutation in a human dna 
repair protein as herein described, such as a deletion, 
truncation, insertion, frame shift, etc., with such mutation 
being indicative of a susceptibility to cancer. 

A mutation may be ascertained for example, by a DNA 
sequencing assay. Tissue samples, including but not limited 
to blood samples are obtained from a human patient . The 
samples are processed by methods known in the art to capture 
the RNA. First strand cDNA is synthesized from the RNA 
samples by adding an oligonucleotide primer consisting of 
polythymidine residues which hybridize to the polyadenosine 
stretch present on the mRNA's. Reverse transcriptase and 
deoxynucleotides are added to allow synthesis of the first 
strand cDNA. Primer sequences are synthesized based on the 
DNA sequence of the DNA repair protein of the invention. The 
primer sequence is generally comprised of 15 to 30 and 
preferably from 18 to 2 5 consecutive bases of the human DNA 
repair gene. Table 1 sets forth an illustrative example of 
oligonucleotide primer sequences based on hMLHl . The primers 
are used in pairs (one "sense" strand and one "anti -sense" ) 
to amplify the cDNA from the patients by the PGR method 
(Saiki et aJ . , Nature, 324:163-166 (1986)) such that three 
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overlapping fragments of the patient's cDNA's for such 
protein are generated. Table l also shows a list of 
preferred primer sequence pairs. The overlapping fragments 
are then subjected to dideoxynucleotide sequencing using a 
set of primer sequences synthesized to correspond to the base 
pairs of the cDNA's at a point approximately every 200 base 
pairs throughout the gene . 
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TABLE 1 

Primer Sequences used to amolifY gene region us ing PCR 



Start Site 



Name 


and Arrangement 


Seouence 


758 


sense- ( -41) 


GTTGAACATCTAGACGTCTC 


1319 


sense-8 


TCGTGGCAGGGGTTATTCG 


1321 


sense-619 


CTACCCAATGCCTCAACCG 


1322 


sense-677 


GAGAACTGATAGAAATTGGATG 


1314 


sense-1548 


GGGACATGAGGTTCTCCG 


1323 


sense-1593 


GGGCTGTGTGAATCCTCAG 


773 


anti-53 


CGGTTCACCACTGTCTCGTC 


1313 


anti-971 


TCCAGGATGCTCTCCTCG 


1320 


anti-1057 


CAAGTCCTGGTAGCAAAGTC 


1315 


anti-1760 


ATGGCAAGGTCAAAGAGCG 


1316 


anti-1837 


CAACAATGTATTCAGXAAGTCC 


1317 


anti-2340 


TTGATACAACACrrrGTATCG 


1318 


anti-2415 


GGAATACTATCAGAAGGCAAG 



* Numbers corresponding to location along nucleotide 
sequence of Figure 1 where ATG is number l. 
Preferred primer sequences pairs: 

758, 1313 
1319, 1320 
660, 1909 
725, 1995 
1680, 2536 
1727, 2610 

The nucleotide sequences shown in Table 1 represent SEQ ID 
No. 7 through 19, respectively. 
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Table 2 lists representative exaniples of 
oligonucleotide primer sequences {sense and anti-sense) 
which may be used, and preferably the entire set of primer 
sequences are used for sequencing to determine where a 
mutation in the patient DNA repair protein may be. The 
primer sequences may be from 15 to 3 0 bases in length and 
are preferably between 18 and 25 bases in length. The 
sequence information determined from the patient is then 
compared to non -mutated sequences to determine if any 
mutations are present. 



TABLE 2 

Primer Sequences Used to Secruence the Amplified Fraqments 

Start Site 

Name Number and Arrangement Secruen ce 



ACAGAGCAAGTTACTCAGATG 
GTACACAATGCAGGCATTAG 
AATGTGGATGTTAATGTGCAC 
CTGACCTCGTCTTCCTAC 
CAGCAAGATGAGGAGATGC 
GGAAATGGTGGAAGATGATTC 
CTTCTCAACACCAAGC 
GAAATTGATGAGGAAGGGAAC 
CTTCTGATTGACAACTATGTGC 
CACAGAAGATGGAAATATCCTG 
GTGTTGGTAGCACTTAAGAC 
TTTCC CATATTCTTCACTTG 
GTAACATGAGCCACATGGC 
CCACTGTCTCGTCCAGCCG 

* Numbers corresponding to location along nucleotide 
sequence of Figure l where ATG is number 1. 
The nucleotide sequences shown in Table 2 represent SEQ ID 
No. 20 through 33, respectively. 

In another embodiment, the primer sequences from Table 



5282 


seqOl 


sense 


-377* 


5283 


seq02 


sense 


-552 


5284 


seq03 


sense 


-904 


5285 


seq04 


sense 


-1096 


5286 


seq05 


sense 


-1276 


5287 


seq06 


sense 


-1437 


5288 


seqOV 


sense 


-1645 


5289 


seq08 


sense 


-1895 


5295 


Beq09 


sense 


-1921 


5294 


seqlO 


sense 


-2202 


5293 


seqll 


sense 


-2370 


5291 


seql2 


anti- 


525 


5290 


seql3 


anti - 


341 


5292 


seql4 


anti- 


46 
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2 could be used in the PCR method to anplify a mutated 
region. The region could be sequenced and used as a 
diagnostic to predict a predisposition to such mutated 
genes . 

Alternatively y the assay to detect mutations xn the 
genes of the present invention may be performed by genetic 
testing based on DNA secfuence differences achieved by 
detection of alteration in electrophoretic mobility of DNA 
fragments in gels with or without denaturing agents. Small 
sequence deletions and insertions can be visualized by high 
resolution gel electrophoresis. DNA fragments of different 
sequences may be distinguished on denaturing formamide 
gradient gels in which the mobilities of different DNA 
fragments are retarded in the gel at different positions 
according to their specific melting or partial melting 
temperatures (see, e.g., Myers et ai . , Science, 230:1242 
(1985) ) . 

Sequence changes at specific locations may also be 
revealed by nuclease protection assays, such as RNase and 
SI protection or the chemical cleavage method (e.g., Cotton 
et ai., PNAS, USA, 85:4397-4401 (1985)). Perfectly matched 
sequences can be distinguished from mismatched duplexes by 
RNase A digestion or by differences in melting 
temperatures . 

Thus, the detection of a specific DNA sequence may be 
achieved by methods such as hybridization, RNase 
protection, chemical cleavage, Western Blot analysis, 
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direct DNA sequencing or the use of restriction enzymes, 
<e.g., Restriction Fragment Length Polymorphisms (RFLP) ) 
and Southern blotting of genomic DNA. 

In addition to more conventional gel -electrophoresis 
and DNA sequencing, mutations can also be detected by in 
situ analysis. 

The polypeptides may also be employed to treat cancers 
or to prevent cancers, by expression of such polypeptides 
in vivo, which is often referred to as "gene therapy." 

Thus, for example, cells from a patient may be 
engineered with a polynucleotide (DNA or RNA) encoding a 
polypeptide ex vivo, with the engineered cells then being 
provided to a patient to be treated with the polypeptide. 
Such methods are well-)cnown in the art. For example, cells 
may be engineered by procedures known in the art by use of 
a retroviral particle containing RNA encoding a polypeptide 
of the present invention. 

Similarly, cells may be engineered in vivo for 
expression of a polypeptide in vivo by, for example, 
procedures known in the art . As known in the art , a 
producer cell for producing a retroviral particle 
containing RNA encoding the polypeptide of the present 
invention may be administered to a patient for engineering 
cells in vivo and expression of the polypeptide in vivo. 
These and other methods for administering a polypeptide of 
the present invention by such method should be apparent to 
those skilled in the art from the teachings of the present 
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invention. For example, the expression vehicle for 
engineering cells may be other than a retrovirus, for 
example, an adenovirus which may be used to engineer cells 
in vivo after combination with a suitable delivery vehicle. 

Each of the cDNA sequences identified herein or a 
portion thereof can be used in numerous ways as 
polynucleotide reagents. The sequences can be used as 
diagnostic probes for the presence of a specific mRNA in a 
particular cell type. in addition, these sequences can be 
used as diagnostic probes suitable for use in genetic 
linkage analysis (polymorphisms) . 

The sequences of the present invention are also 
valuable for chromosome identification. The sequence is 
specifically targeted to and can hybridize with a 
particular location on an individual human chromosome. 
Moreover, there is a current need for identifying 
particular sites on the chromosome. Few chromosome marking 
reagents based on actual sequence data (repeat 
polymorphisms) are presently available for marking 
chromosomal location. The mapping of DMAs to chromosomes 
according to the present invention is an important first 
step in correlating those sequences with genes associated 

with disease. 

Briefly, sequences can be mapped to chromosomes by 
preparing PCR primers (preferably 15-25 bp) from the cDNA. 
Con^uter analysis of the 3' untranslated region is used to 
rapidly select primers that do not span more than one exon 
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in the genomic DNA, thus complicating the amplification 
process. These primers are then used for PCR screening of 
somatic cell hybrids containing individual human 
chromosomes . Only those hybrids containing the human gene 
corresponding to the primer will yield an amplified 
fragment . 

PCR mapping of somatic cell hybrids is a rapid 
procedure for assigning a particular DNA to a particular 
chromosome. Using the present invention with the same 
oligonucleotide primers, sublocalization can be achieved 
with panels of fragments from specific chromosomes or pools 
of large genomic clones in an analogous manner. Other 
mapping strategies that can similarly be used to map to its 
chromosome include In situ hybridization, prescreening with 
labeled flow-sorted chromosomes and preselection by 
hybridization to construct chromosome -specif ic cDNA 
libraries . 

Fluorescence In situ hybridization (FISH) of a cDNA 
clone to a metaphase chromosomal spread can be used to 
provide a precise chromosomal location in one step . This 
technique can be used with cDNA as short as 500 or 600 
bases; however, clones larger than that have a higher 
likelihood of binding to a unique chromosomal location with 
sufficient signal intensity for sin^sle detection. FISH 
requires use of the clones from which the express sequence 
tag or EST was derived, and the longer the better. For 
exan^le, 2,000 bp is good, 4,000 is better, and more than 



4,000 is probably not necessary to get good results a 
reasonable percentage of the time. For a review of this 
technique, see Verma et al , , Human Chromosomes: a Manual 
of Basic Techniques, Pergamon Press, New York (198 8) . 

once a sequence has been mapped to a precise 
chromosomal location, the physical position of the sequence 
on the chromosome can be correlated with genetic map data. 
Such data are found, for example, in V. McKusick, Mendelian 
Inheritance in Man (available on line through Johns Hopkins 
University Welch Medical Library) . The relationship 
between genes and diseases that have been mapped to the 
same chromosomal region are then identified through linkage 
analysis (coinheritance of physically adjacent genes) . 

Next, it is necessary to determine the differences in 
the cDNA or genomic sequence between affected and 
unaffected individuals. If a mutation is observed in some 
or all of the affected individuals but not in any normal 
individuals, then the mutation is likely to be the 
causative agent of the disease. 

with current resolution of physical mapping and 
genetic mapping techniques, a cDNA precisely localized to a 
chromosomal region associated with the disease could be one 
of between 50 and 500 potential causative genes. (This 
assumes 1 megabase mapping resolution and one gene per 20 
kb) . 

hMLH2 has been localized using a genomic PI clone 
(1670) which contained the 5' region of the hMLH2 gene. 
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WW ^^I^W to 

Detailed analysis of human metaphase chromosome spreads, 
counterstained to reveal banding, indicated that the hMLH2 
gene was located within bands 2q32 . Likewise, hMLH3 was 
localized using a genomic Pi clone (2053) which contaxned 
the 3' region of the hMLH3 gene. Detailed analysis of 
human metaphase chromosome spreads, counterstained to 
reveal banding, indicated that the hMLH3 gene was located 
within band 7p22, the most distal band on chromosome 7. 
Analysis with a variety of genomic clones showed that hMLH3 
was a member of a subfamily of related genes, all on 
chromosome 7 . 

The polypeptides, their fragments or other 
derivatives, or analogs thereof, or cells expressing them 
can be used as an immunogen to produce antibodies thereto. 
These antibodies can be, for exan^le, polyclonal or 
monoclonal antibodies. The present invention also includes 
chimeric, single chain, and humanized antibodies, as well 
as Fab fragments, or the product of an Fab expression 
library. Various procedures known in the art may be used 
for the production of such antibodies and fragments. 

Antibodies generated against the polypeptides 
corresponding to a sequence of the present invention can be 
obtained by direct injection of the polypeptides into an 
animal or by administering the polypeptides to an animal, 
preferably a nonhuman. The antibody so obtained will then 
bind the polypeptides itself. In this manner, even a 
sequence encoding only a fragment of the polypeptides can 
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be used to generate antibodieB binding the whole native 
polypeptides. Such antibodies can then be used to isolate 
the polypeptide from tissue expressing that polypeptide. 

For preparation of monoclonal antibodies, any 
technique which provides antibodies produced by continuous 
cell line cultures can be used. Examples include the 
hybridoma technique (Kohler and Milstein, 1975, Nature, 
256:495-497), the trioma technique, the human B-cell 
hybridoma technique (Kozbor et al . , 1983, Immunology Today 
4:72), and the EBV-hybridoma technique to produce human 
monoclonal antibodies (Cole, et al., 1985, in Monoclonal 
Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77- 
96) . 

Techniques described for the production of single 
chain antibodies (U.S. Patent 4,946,778) can be adapted to 
produce single chain antibodies to immunogenic polypeptide 
products of this invention. Also, transgenic mice may be 
used to express humanized antibodies to immunogenic 
polypeptide products of this invention. 

The present invention will be further described with 
reference to the following examples; however, it is to be 
understood that the present invention is not limited to 
such examples. All parts or amounts, unless otherwise 
specified, are by weight. 

In order to facilitate understanding of the following 
examples certain frequently occurring methods and/or terms 
will be described. 
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"Plasmids" are designated by a lower case p preceded 
and/or followed by capital letters and/or numbers. The 
starting plasmids herein are either commercially available, 
publicly available on an unrestricted basis, or can be 
constructed from available plasmids in accord with 
published procedures. In addition, equivalent plasmids to 
those described are known in the art and will be apparent 
to the ordinarily skilled artisan. 

"Digestion" of DNA refers to catalytic cleavage of the 
DNA with a restriction enzyme that acts only at certain 
sequences in the DNA. The various restriction enzymes used 
herein are commercially available and their reaction 
conditions, cof actors and other requirements were used as 
would be known to the ordinarily skilled artisan. For 
analytical purposes, typically 1 ^g of plasmid or DNA 
fragment is used with about 2 \inits of enzyme in about 2 0 
/il of buffer solution. For the purpose of isolating DNA 
fragments for plasmid construction, typically 5 to 50 of 
DNA are digested with 2 0 to 250 units of enzyme in a 
larger volume. Appropriate buffers and substrate amounts 
for particular restriction enzymes are specified by the 
manufacturer, incubation times of about 1 hour at 37 *C are 
ordinarily used, but may vary in accordance with the 
supplier's instructions. After digestion the reaction is 
electrophoresed directly on a polyacrylamide gel to isolate 
the desired fragment. 



Size separation of the cleaved fragments is performed 
using 8 percent polyacrylamide gel described by Goeddel , D. 
et al., Nucleic Acids Res., 8:4057 (1980). 

"Oligonucleotides" refers to either a single stranded 
polydeoxynucleotide or two complementary 
polydeoxynucleotide strands which may be chemically 
synthesized. Such synthetic oligonucleotides have no 5' 
phosphate and thus will not ligate to another 
oligonucleotide without adding a phosphate with an ATP in 
the presence of a kinase. A synthetic oligonucleotide will 
ligate to a fragment that has not been dephosphorylated. 

"Ligation" refers to the process of forming 
phosphodiester bonds between two double stranded nucleic 
acid fragments (Maniatis, T. , et al . , Id., p. 146). Unless 
otherwise provided, ligation may be accomplished using 
known buffers and conditions with 10 units to T4 DNA ligase 
("ligase") per 0.5 /ig of approximately equimolar amounts of 
the DNA fragments to be ligated. 

Unless otherwise stated, transformation was performed 
as described in the method of Graham, F. and Van der Eb, 
A., Virology, 52:456-457 (1973). 

Example 1 
Bacterial Expression of hMLHl 

The full length DNA sequence encoding human DNA 
mismatch repair protein hMLHl, ATCC # 75649, is initially 
amplified using PCR oligonucleotide primers corresponding 
to the 5' and 3' ends of the DNA sequence to synthesize 
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insertion fragments. The 5' oligonucleotide primer has the 
sequence 5' CGGGATCCATGTCGTTCGTGGCAGGG 3' (SEQ ID No . 34), 
contains a BamHI restriction enzyme site followed by 18 
nucleotides of hMLHl coding sequence following the 
initiation codon; the 3' sequence 5' GCTCTAGATTAACACCTCT 
CAAAGAC 3' (SEQ ID NO. 35) contains complementary sequences 
to an Xbal site and is at the end of the gene. The 
restriction enzyme sites correspond to the restriction 
enzyme sites on the bacterial expression vector pQE-9. 

(Qiagen, Inc., Chatsworth, CA) . The plasmid vector encodes 
antibiotic resistance {AmpM , a bacterial origin of 
replication (ori) , an iPTG-regulatable promoter/ operator 

(P/O) , a ribosome binding site (RBS) , a 6-histidine tag {6- 
His) and restriction enzyme cloning sites. The pQE-9 
vector is digested with BamHI and Xbal and the insertion 
fragments are then ligated into the pQE-9 vector 
maintaining the reading frame initiated at the bacterial 
RBS. The ligation mixture is then used to transform the E. 
coll strain M15/rep4 (Qiagen, Inc.) which contains multiple 
copies of the plasmid pREP4 , which expresses the lad 
repressor and also confers kanamycin resistance (Kan') . 
Transformants are identified by their ability to grow on LB 
plates and ampicillin/Jcanamycin resistant colonies are 
selected. Plasmid DNA is isolated and confirmed by 
restriction analysis . Clones containing the desired 
constructs are grown overnight (0/N) in liquid culture in 
LB media supplemented with both Amp (100 ug/ml) and Kan (25 



ug/ml) . Tho 0/N culture is used to inoculate a large 
culture at a ratio of 1:100 to 1:250. The cells are grown 
to an optical density 600 (O.D.*^) of between 0.4 and 0.6. 
IPTG (isopropyl-B-D-thiogalacto pyranoside) is then added 
to a final concentration of 1 mM. IPTG induces by 
inactivating the lad repressor, clearing the P/0 leading 
to increased gene expression. Cells are grown an extra 3 
to 4 hours. Cells are then harvested by centrif ugation (20 
mins at 6000Xg) . The cell pellet is solubilized in the 
chaotropic agent 6 Molar Guanidine HCl . After 
clarification, solubilized hMLHl is purified from this 
solution by chromatography on a Nickel -Chelate column under 
conditions that allow for tight binding by proteins 
containing the 6 -His tag (Hochuli, E. et al . , Genetic 
Engineering, Principles & Methods, 12:87-98 (1990). 
Protein renaturation out of GnHCl can be accomplished by 
several protocols (Jaenicke, R. and Rudolph, R. , Protein 
Structure - A Practical Approach, IKL Press, New York 
(1990)). Initially, step dialysis is utilized to remove 
the GnHCL, Alternatively, the purified protein isolated 
from the Ni -chelate column can be bound to a second column 
over which a decreasing linear GnHCL gradient is run. The 
protein is allowed to renature while bound to the column 
and is subsequently eluted with a buffer containing 250 mM 
Imidazole, 150 mM NaCl , 25 mM Tris-HCl pH 7.5 and 10% 
Glycerol. Finally, soluble protein is dialyzed against a 
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storage buffer containing 5 tnM Ammonium Bicarbonate. The 
purified protein was analyzed by SDS-PAGE. 



Example 2 

Spontaneous Mutation Assay for Detection of the Expression 
of hMLHi. hMLH2 and hMLH3 and Complementation to the E.coli 
mutl 

The pOE9hMLHl, pQE9hMLH2 or p0E9hMLH3/GW373 3 , 
transf ormants were subjected to the spontaneous mutation 
assay. The plasmid vector pQE9 was also transformed to 
AB1157 (k-12, argE3 hisG4 ,LeuB6 proA2 thr-1 ara-1 rpsL31 
supE44 tsx-33) and GW3733 to use as the positive and 
negative control respectively. 

Fifteen 2 ml cultures, inoculated with approximately 
100 to 1000 E. coli, were grown 2x10*^ cells per ml in LB 
ampicillin medium at 3 7"C. Ten microliters of each culture 
were diluted and plated on the LB ampicillin plates to 
measure the number of viable cells. The rest of the cells 
from each culture were then concentrated in saline and 
plated on minimal plates lacking of arginine to measure 
reversion of Arg^ . In Table 3, the mean number of 
mutations per culture im) was calculated from the median 
number (r) of mutants per distribution, according to the 
equation (r/m) -In (w) = 1.24 (Lea et al . , J. Genetics 
49:264-285 (1949)). Mutation rates per generation were 
recorded as m/N, with N representing the average number of 
cells per culture. 
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TABLE 3 



Spontaneous Mutation Rates 



Strain 



Mutation/generation 



ABll57+vector 



(5.6±0.1) X 10-9a 



GW2733+vector 



(1.1±0.2) X 10-6a 



GW3 73 3+phMLHl 



(3 .7±1,3 X 10-7a 



GW3 73 3+phMLH2 



(3.1±0.6) X 10-7b 



GW373 3+phMLH3 



(2.1±0.8) X 10-7b 



a: Average of three experiments, 
b: Average of four experiments. 

The fiinctional complementation result showed that the 
human mutL can partially rescue the E . coli mutL mutator 
phenotype, suggesting that the human mutL is not only 
successfully expressed in a bacterial expression system, 
but also functions in bacteria. 



Chromosomal Mapping of the hMLHl 

An oligonucleotide primer set was designed according 
to the sequence at the 5' end of the cDNA for HMLHl. This 
primer set would span a 94 bp segment. This primer set was 
used in a polymerase chain reaction under the following set 
of conditions : 



Example 3 
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3 0 seconds, 95 degrees C 

1 minute, 56 degrees C 

1 minute, 7 0 degrees C 
This cycle was repeated 32 times followed by one 5 minute 
cycle at 70 degrees C. Human, mouse, and hamster DNA were 
used as template in addition to a somatic cell hybrid panel 
(Bios, Inc) . The reactions were analyzed on either 8% 
polyacrylamide gels or 3 . 5 % agarose gels. A 94 base pair 
band was observed in the human genomic DNA sample and in 
the somatic cell hybrid sample corresponding to chromosome 
3, In addition, using various other somatic cell hybrid 
genomic DNA, the hMLHl gene was localized to chromosome 3p. 

Example 4 

Method for Determination of mutation of hMLHl gene in HNPCC 
kindred 

cDNA was produced from RNA obtained from tissue 
samples from persons who are HNPCC kindred and the cDNA was 
used as a template for PCR, employing the primers 5' GCATC 
TAGACGTTTCCTTGGC 3' (SEQ ID No. 36) and 5' CATCCAAGCTTCTGT 
TCCCG 3' (SEQ ID No. 37), allowing amplification of codons 
1 to 394 of Figure 1; 5' GGGGTGCAGCAGCACATCG 3' (SEQ ID No. 
38) and 5' GGAGGCAGAATGTGTGAGCG 3' (SEQ ID No. 39), 
allowing amplification of codons 326 to 729 of Figure 1 
(SEQ ID No. 2); and 5' TCCCAAAGAAGGACTTGCT 3' (SEQ ID No. 
40) and 5' AGTATAAGTCTTAAGTGCTACC 3' (SEQ ID No . 41), 
allowing amplification of codons 602 to 756 plus 128 nt of 
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3'- untranslated sequences of Figure 1 kiSEQ IB Ko , 2). The 
PGR conditions for all analyses used consisted of 3 5 cycles 
at 95°C for 30 seconds, 52-580C for 60 to 120 seconds, and 
70OC for 60 to 120 seconds, in the buffer solution 
described in San Sidransky, D. et al . , Science, 252:706 
(1991) . PGR products were sequenced using primers labeled 
at their 5' end with T4 polynucleotide kinase, employing 
SequiTherm Polymerase (Epicentre Technologies) . The 
intron-exon borders of selected exons were also determined 
and genomic PGR products analyzed to confirm the results . 
PGR products harboring suspected mutations were then cloned 
and sequenced to validate the results of the direct 
sequencing. PGR products were cloned into T-tailed vectors 
as described in Holton, T.A. and Graham, M.W. , Nucleic 
Acids Research, 19: 1156 (1991) and sequenced with T7 
polymerase (United States Biochemical) . Affected 
individuals from seven kindreds all exhibited a 
heterozygous deletion of codons 578 to 632 of the hMLHl 
gene. The derivation of five of these seven kindreds could 
be traced to a common ancestor. The genomic sequences 
surrounding codons 578-632 were determined by cycle- 
sequencing of the PI clones (a human genomic Pi library 
which contains the entire hMLHl gene (Genome Systems)) 
using SequiTherm Polymerase, as described by the 
manufacturer, with the primers were labeled with T4 
polynucleotide kinase, and by sequencing PGR products of 
genomic DNA. The primers used to amplify the exon 



containing codons 578-632 were 5' TTTATGCrTTCTCACCTGCC 3' 
(SEQ ID No. 42) and 5' GTTATCTGCCCACCTCAGC 3' {SEQ ID No. 
43) . The PGR product included 105 bp of intron C sequence 
upstream of the axon and 117 bp downstream. No mutations 
in the PGR product were observed in the kindreds, so the 
deletion in the RNA was not due to a simple splice site 
mutation. Codons 578 to 632 were foiind to constitute a 
single exon which was deleted from the gene product in the 
kindreds described above. This exon contains several 
highly conserved amino acids. 

In a second family (L7) , PGR was performed using the 
above primers and a 4bp deletion was observed beginning at 
the first nucleotide (nt) of codon 727. This produced a 
frame shift with a new stop codon 166 nt downstream, 
resulting in a substitution of the carboxy- terminal 29 
amino acids of hMLHl with 5 3 different amino acids, some 
encoded by nt normally in the 3' untranslated region. 

A different mutation was found in a different kindred 
(L2516) after PGR using the above primers, the mutation 
consisting of a 4bp insert between codons 755 and 756. 
This insertion resulted in a frame shift and extension of 
the ORF to include 102 nucleotides (34 amino acids) 
downstream of the normal termination codon. The mutations 
in both kindreds L7 and L2516 were therefore predicted to 
alter the C-terminus of hMLHi . 

A possible mutation in the hMLHl gene was determined 
from alterations in size of the encoded protein, where 
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kindreds were too few for linkage studies . The primers 
used for coupled transcription-translation of hMLHi were 5' 
GGATCCTAATACGAOTCACTATAGGGAGACCACCATGGCATCT 
AGACGTTTCCCTTGGC 3' (SEQ ID No. 44) and 5' 

CATCCAAGCTTCTGTTCCCG 3' (SEQ ID No. 45) for codons 1 to 394 
of Figure 1 and 5' GGATCCTAATACGACTCACTATAGGGAGACCACCATGGG 
GGTGCAGCAGCACATCG 3' (SEQ ID No. 46) and 5' GGAGGCAGAATGTG 
TGAGCG 3' (SEQ ID No. 47) for codons 326 to 729 of Figure 1 
(SEQ ID No. 2) . The resultant PGR products had signals for 
transcription by T7 RNA polymerase and for the initiation 
of translation at their 5' ends. RNA from lymphoblastoid 
cells of patients from 18 kindreds was used to amplify two 
products, extending from codon 1 to codon 394 or from codon 
326 to codon 729, respectively. The PGR products were then 
transcribed and translated in vitro, making use of 
transcription-translation signals incorporated into the PGR 
primers. PGR products were used as templates in coupled 
transcription-translation reactions performed as described 
by Powell, S.M, et al , , New England Journal of Medicine, 
329:1982, (1993), using 40 micro CI of ^^S labeled 
methionine. Samples were diluted in sample buffer, boiled 
for five minutes and analyzed by electropheresis on sodium 
dodecyl sulf ate-polyacrylamide gels containing a gradient 
of 10% to 2 0% acrylamide. The gels were dried and 
subjected to radiography. All samples exhibited a 
polypeptide of the expected size, but an aibnormally 
migrating polypeptide was additionally found in one case. 
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The sequence of the relevant PGR product was determined and 
found to include a 3 71 bp deletion beginning at the first 
nucleotide (nt) of codon 347. This alteration was present 
in heterozygous form, and resulted in a frame shift in a 
new stop codon 30 nt downstream of codon 346, thus 
explaining the truncated polypeptide observed. 

Four colorectal tumor cell lines manifesting 
microsatellite instability were examined. One of the four 
(cell line H6) showed no normal peptide in this assay and 
produced only a short product migrating at 2 7 kd. The 
sequence of the corresponding cDNA was determined and found 
to harbor a C to A transversion at codon 252, resulting in 
the substitution of a termination codon for serine. In 
accord with the translational analyses, no band at the 
normal C position was identified in the cDNA or genomic DNA 
from this tumor, indicating that it was devoid of a 
functional hMLHl gene. 

Table 4 sets forth the results of these sequencing 
assays. Deletions were found in those people who were 
known to have a family history of the colorectal cancer. 
More particularly, 9 of 10 families showed an hMLHl 
mutation. 
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Table 4 - Summary of Mutations in hMLHl 



sample 



Kindred L7 



Kindred L2516 



Kindred RA 



H6 Colorectal Tumor 



Codon 



Kindreds F2, F3 , F6 . F8 ; 578-632 
FIO, Fll, F52 



727/728 



755/756 



347 



252 



cDNA Nucleotide 
Change 

165 bp deletion 

4 bp deletion 
(TCACACATTC to 
TCATTCT) 

4 bp insertion 
(GTGTTAA to 
GTGTTTGTTAA ) 

3 71 bp deletion 

Transversion 
(TCA to TAA) 



Predicted 

In -frame 
deletion 

FratEshift and 

SUbBt±tUtiC31 cf 

ne^ oniiiu ariffe 

Bttonsiai cf C- 
terminus 

Frameshif t/ 
Truncation 

Serine to Stxp 



Example 5 

Bacterial Expression and Purification of hMLH2 

The DNA sequence encoding hMLH2, ATCC #75651, is 
initially amplified using PCR oligonucleotide primers 
corresponding to the 5' and 3' ends of the DNA sequence to 
synthesize insertion fragments. The 5' oligonucleotide 
primer has the sequence 5' CGGGATCCATGAAACAATTGCCTGCGGC 3' 
(SEQ ID No. 48) contains a BamHI restriction enzyme site 
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followed by 17 nucleotides of hMLH2 following the 
initiation codon. The 3' sequence 5' GCTCTAGACCAGACTCAT 
GCTGTTTT 3' (SEQ ID No. 49) contains complementary 
sequences to an Xbal site and is followed by 18 nucleotides 
of hMLH2. The restriction enzyme sites correspond to the 
restriction enzyme sites on the bacterial expression vector 
pOE-9 (Qiagen, Inc. Chatsworth, CA) . pQE-9 encodes 
antibiotic resistance (Amp') , a bacterial origin of 
replication (ori) , an IPTG-regulatsOile promoter operator 
(P/0) , a ribosome binding site (RBS) , a 6 -His tag and 
restriction enzyme sites. The amplified sequences and pQE- 
9 are then digested with BamHI and Xbal. The att^lified 
sequences are ligated into pOE-9 and are inserted in frame 
with the sequence encoding for the histidine tag and the 
RBS. The ligation mixture is then used to transform 
coli strain Ml5/rep4 (Qiagen, Inc.) which contains multiple 
copies of the plasmid pREP4 , which expresses the lad 
repressor and also confers kanamycin resistance (Kan') . 
Transf ormants are identified by their ability to grow on LB 
plates and ampicillin/kanamycin resistant colonies are 
selected. Plasmid DNA is isolated and confirmed by 
restriction analysis . Clones containing the desired 
constructs are grown overnight (O/N) in liquid culture in 
LB media supplemented with both Amp (100 ug/ml) and Kan (25 
ug/ml) . Tho O/N culture is used to inoculate a large 
culture at a ratio of 1:100 to 1:250. The cells are grown 
to an optical density 600 (O.D.**~) of between 0.4 and 0.6. 
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IPTG (isopropyl-B-D-thiogalacto pyranoside) is then added 
to a final concentration of l inM. IPTG induces by 
inactivating the lad repressor, clearing the P/0 leading 
to increased gene expression. Cells are grown an extra 3 
to 4 hours. Cells are then harvested by centrif ugation (20 
mins at 6000Xg) . The cell pellet is solubilized in the 
chaotropic agent 6 Molar Guanidine HCl, After 
clarification, solubilized hMLH2 is purified from this 
solution by chromatography on a Nickel -Chelate column under 
conditions that allow for tight binding by proteins 
containing the 6-His tag (Hochuli, E. et al . , Genetic 
Engineering, Principles & Methods, 12:87-98 (1990). 
Protein renaturation out of GnHCl can be accomplished by 
several protocols (Jaenicke, R. and Rudolph, R. , Protein 
Structure - A Practical Approach, IRL Press, New York 
(1990)). Initially, step dialysis is utilized to remove 
the GnHCL. Alternatively, the purified protein isolated 
from the Ni- chelate column can be bound to a second column 
over which a decreasing linear GnHCL gradient is run. The 
protein is allowed to renature while bound to the column 
and is subsequently eluted with a buffer containing 250 mM 
Imidazole, 150 mM NaCl , 25 mM Tris-HCl pH 7 . 5 and 10% 
Glycerol. Finally, soliible protein is dialyzed against a 
storage buffer containing 5 mM Ammonium Bicarbonate. The 
purified protein was analyzed by SDS-PAGE, 
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Example 6 

Bacterial Expression and Purification of hMLH3 

The DNA sequence encoding hMLH3 , ATCC #75650, is 
initially amplified using PCR oligonucleotide primers 
corresponding to the 5' and 3' ends of the DNA sequence to 
synthesize insertion fragments. The 5' oligonucleotide 
primer has the sequence 5' CGGGATCCATGGAGCGAGCTGAGAGC 3' 

(SEQ ID No. 50) contains a BamHl restriction enzyme site 
followed by 18 nucleotides of hMLH3 coding sequence 
starting from the presumed terminal amino acid of the 
processed protein. The 3' sequence 5' GCTCTAGAGTGAAG 
ACTCTGTCT 3' {SEQ ID No . 51) contains complementary 
sequences to an Xbal site and is followed by 18 nucleotides 
of hMLH3 . The restriction enzyme sites correspond to the 
restriction enzyme sites on the bacterial expression vector 
pQE-9 (Qiagen, Inc. Chatsworth, CA) . pQE-9 encodes 
antibiotic resistance (Amp') , a bacterial origin of 
replication (ori) , an IPTG-regulatable promoter operator 

(P/O) , a ribosome binding site (RBS) , a 6 -His tag and 
restriction enzyme sites. The amplified sequences and pQE- 
9 are then digested with BamHI and Xbal. The amplified 
sequences are ligated into pQE-9 and are inserted in frame 
with the sequence encoding for the histidine tag and the 
RBS. The ligation mixture was then used to transform 
coli strain M15/rep4 (Qiagen, Inc.) which contains multiple 
copies of the plasmid pREP4 , which expresses the lad 
repressor and also confers kanamycin resistsince (Kan') . 
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Trans formants are identified by their ability to grow on LB 
plates and ampicillin/kanamycin resistant colonies are 
selected. Plasmid DNA is isolated and confirmed by 
restriction analysis. Clones containing the desired 
constructs are grown overnight (0/N) in liquid culture in 
LB media supplemented with both Amp (100 ug/ml) and Kan (25 
ug/ml) . Tho 0/N culture is used to inoculate a large 
culture at a ratio of 1:100 to 1:250. The cells are grown 
to an optical density 600 (O.D.***) of between 0.4 and 0.6. 
IPTG (Isopropyl-B-D-thiogalacto pyranoside) is then added 
to a final concentration of l mM. IPTG induces by 
inactivating the lad repressor, clearing the P/0 leading 
to increased gene expression. Cells are grown an extra 3 
to 4 hours. Cells are then harvested by centrif ugation (20 
mins at GOOOXg) . The cell pellet is solubilized in the 
chaotropic agent 6 Molar Guanidine HCl . After 
clarification, solubilized stanniocalcin is purified from 
this solution by chromatography on a Nickel -Chelate column 
under conditions that allow for tight binding by proteins 
containing the 6 -His tag (Hochuli, E. et al . , Genetic 
Engineering, Principles & Methods, 12:87-98 (1990). 
Protein renaturation out of GnHCl can be accomplished by 
several protocols (Jaenicke, R. and Rudolph, R. , Protein 
Structure - A Practical Approach, IRL Press, New York 
(1990)) . Initially, step dialysis is utilized to remove 
the GnHCL. Alternatively, the purified protein isolated 
from the Ni -chelate column can be bound to a second column 
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over which a decreasing linear GnHCL gradient is run. The 
protein is allowed to renature while bound to the column 
and is subsequently eluted with a buffer containing 250 mM 
Imidazole, 150 mM NaCl , 25 mM Tris-HCl pH 7 , 5 and 10% 
Glycerol. Finally, soluble protein is dialyzed against a 
storage buffer containing 5 mM Ammonium Bicarbonate. The 
purified protein was analyzed by SDS-PAGE. 

Example 7 

Method for determination of mutation of hMLH2 and hMLH3 in 
hereditary cancer 
isolation of Genomic Clones 

A human genomic Pi library (Genomic Systems, Inc.) was 
screened by PGR using primers selected for the cDNA 
sequence of hMLH2 and hMLH3 . Two clones were isolated for 
hMLH2 using primers 5' AAGCTGCTCTGTTAAAAGCG 3' (SEQ ID No. 
52) and 5' GCACCAGCATCCAAGGAG 3' (SEQ ID NO . 53) and 
resulting in a 133 bp product. Three clones were isolated 
for hMLH3, using primers 5' CAACCATGAGACACATCGC 3' (SEQ ID 
No. 54) and 5' AGGTTAGTGAAGACTCTGTC 3' (SEQ ID No. 55) 
resulting in a 121 bp product. Genomic clones were nick- 
translated with digoxigenindeoxy -uridine 5 ' -triphosphate 
(Boehringer Manheim) , and FISH was performed as described 
(Johnson, Cg. et al . , Methods Cell Biol., 35:73-99 (1991)). 
Hybridization with the hMLH3 probe were carried out using a 
vast excess of human cot-1 DNA for specific hybridization 
to the expressed hMLH3 locus . Chromosomes were 
counterstained with 4 , 6 -diamino-2 -phenylidole andpropidium 
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iodide, producing a combination of C- and R - bands . Aligned 
images for precise mapping were obtained using a triple - 
band filter set (Chroma Technology, Brattleboro, VT) in 
combination with a cooled charge -coupled device camera 
(Photometries, Tucson, AZ) and variable excitation 
wavelength filters (Johnson, Cv. et al . , Genet. Anal. Tech. 
Appl., 8:75 (1991)). Image collection, analysis and 
chromosomal fractional length measurements were done suing 
the isee Graphical Program System (Inovision Corporation, 
Durham, NC) . 

Tramscription coupled Translation Mutation Analysis 

For purposes of IVSP analysis the hMIiH2 gene was 
divided into three overlapping segments. The first segment 
included codons 1 to 500, while the middle segment included 
codons 270 to 755, and the last segment included codons 485 
to the translational termination site at codon 933. The 
primers for the first segment were 5' GGATCCTAATACGACTCACT 
ATAGGGAGACCACCATGGAACAATTGCCTGCGG 3' (SEQ ID No. 56) and 5' 
CCTGCTCCACTCATCTGC 3' (SEQ ID No. 57), for the middle 

segment were 5' ggatcctaatacgactcactatagggagaccaccatggaaga 

TATCTTAAAGTTAATCCG 3' (SEQ ID No . 58) and 5' GGCTTCTTCTACTC 
TATATGG 3' (SEQ ID No. 59), and for the final segment were 
5 ' GGATCCTAATACGACTCACTATAGGGAGACCACCATGGCAGGTCTTGAAAACTC 
TTCG 3' (SEQ ID No. 60) and 5' AAAACAAGTCAGTGAATCCTC 3' 
(SEQ ID NO. 61) . The primers used for mapping the stop 
mutation in patient CW all used the same 5' primer as the 
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first segment. The 3' nested primers were: 5' 
AAGCACATCTGTTTCTGCTG 3' (SEQ ID No . 62) codons 1 to 36 9; 5' 
ACGAGTAGATTCCTTTAGGC 3' (SEQ ID No. 63) COdons 1 to 290; 
and 5' CAGAACTGACATGAGAGCC 3' (SEQ ID No. 64) codons 1 to 
214 . 

For analysis of hMIiH3 , the hMLH3 cDNA was amplified as 
a full-length product or as two overlapping segments. The 
primers for full-length hMLH3 were 5' 

GGATCCTAATACGACTCACTATAGGGAGACCACCATGGAGCGAGCTGAGAGC 3 ' 
(SEQ ID NO. 65) and 5' AGGTTAGTGAAGACTTCTGTC 3' (SEQ ID No. 
66) (codons 1 to 863). For segment 1, the sense primer was 
the same as above and the antisense primer was 5' CTGAGGTCT 
CAGCAGGC 3' (SEQ ID No. 67) (codons 1 to 472). Segment 2 
primers were 5' GGATCCTAATACGACTCACTATAGGGAGACCACCATGGTGTC 
CATTTCCAGACTGCG 3' (SEQ ID No, 68) and 5' AGGTTAGTGAAGACTCT 
GTC 3' (SEQ ID No. 69) (codons 415 to 863) . Amplifications 
were done as described below. 

The PGR products contained recognition signals for 
transcription by T7 RNA polymerase and for the initiation 
of translation at thei 5' ends. PGR products were used as 
templates in coupled transcription-translation reactions 
containing 4 0 uCi of ^''S -methionine (NEN, Dupont) . Samples 
were diluted in SDS sample buffer, and analyzed by 
electrophoresis on SDS -poly aery lamide gels containing a 
gradient of 10 to 2 0% aery lamide. The gels were fixed, 
treated with EnHance (Dupont), dried and subjected to 
autoradiography , 
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RT-PCR and Direct Sequencing of PGR Products 

cDNAs were generated from RNA of lymphoblastoid or 
tumor cells with Superscript II (Life Technologies) . The 
CDNAS were then used as templates for PCR. The conditions 
for all amplifications were 35 cycles at 95<'C for 30s, 52»C 
to 62°C for 60 to 120s, and TO^C for 60 to 120s, in buffer. 
The PCR products were directly sequenced and cloned into 
the T- tailed cloning vector PCR2000 (Invitrogen) and 
sequenced with T7 polymerase (United States Biochemical) . 
For the direct sequencing of PCR products, PCR reactions 
were first phenol chloroform extracted and ethanol 
precipitated. Templates were directly sequenced using 
Sequi therm polymerase (Epicentre Technologies) and gamma -"P 
labelled primers as described by the manufacturer. 

Intron/Exon Boundaries and Genomic Analysis of Mutations 

Intron/exon borders were determined by cycle- 
sequencing PI clones using gamma-"P end labelled primers 
and SequiTherm polymerase as described by the manufacturer. 
The primers used to amplify the hMLH2 exon containing 
codons 195 to 233 were 5' TTATTTGGCAGAAAAGCAGAG (SEQ ID No. 
70) 3' and 5' TTAAAAGACTAACCTCTTGCC 3' (SEQ ID No. 71), 
which produced a 215 bp product. The product was cycle 
sequenced using the primer 5' CTGCTGTTATGAACAATATGG 3' (SEQ 
ID No. 72) . The primers used to analyze the genomic 
deletion of hMLH3 in patient GC were: for the 5' region 

-49- 



amplification 5' CAGAAGCAGTTGCAAAGCC 3' (SEQ ID No. 73) and 
5' AAACCGTACTCTTCACACAC 3' (SEQ ID No. 74) which produces a 
74 bp product containing codons 233 to 257, primers 5' 
GAGGAAAAGCTTTTGTTGGC 3' {SEQ ID No. 75) and 5' 
CAGTGGCTGCTGACTGAC 3' (SEQ ID No. 76) which produce a 93 bp 
product containing the codons 34 7 to 3 77, and primers 5' 
TCCAGAACCAAGAAGQAGC 3' (SEQ ID No. 77) and 5' 
TGAGGTCTCAGCAGGC 3' (SEQ ID No. 78) which produce a 99 bp 
product containing the codons 439 to 472 of hMLH3 . 
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TABLE 5 

Summary of Mutations in HMLH2 and HMLH3 
from patients affected with HNPCC 



Sample 

HMLH2 
CW 



Codon 



Genomic 

Nucleotides cDNA Change Change 



233 



Skipped 
Exon 



CAG to TAG 



Predicted 

Coding 

Change 



GLN to Stop 
Codon 



HMLH3 

Ns, 20 COG to CAG CGG to CAG ARG to GLN 

TF 



GC 



GCx 



268 to 
669 

268 to 
669 



1,203 bp 
Deletion 

1,203 bp 
Deletion 



Deletion 



Deletion 



In -frame 
deletion 

Frameshif t , 
trucation 



Numerous modifications and variations of the present 
invention are possible in light of the above teachings and, 
therefore, within the scope of the appended claims, the invention 
may be practiced otherwise than as particularly described. 
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(B) TELEFAX: 201-994-1744 



(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2525 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

( D ) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

GTTGAACATC TAGACGTTTC CTTGGCrCrT CTGGCGCCAA AATGTCGTTC GTGGCAGGGG 60 

TTATTCGGCG GCTGGACGAG ACAGTGGTGA ACCGCATCGC GGCGGGGGAA GTTATCCAGC 120 

GGCCAGCTAA TGCTATCAAA GAGATGATTG AGAACrGTTT AGATGCAAAA TCCACAAGTA 180 

TTCAAGTGAT TGTTAAAGAG GGAGGCCTGA AGTTGATTCA GATCCAAGAC AATGGCACCG 24 0 

GGATCAGGAA AGAAGATCTG GATATTGTAT GTGAAAGTGT CACTACTAGT AAACTGCAGT 300 

CCTTTGAGGA TTTAGCCAGT ATTTCTATCT ATGGCTTTCG AGGTGAGGCT TTGGCCAGCA 360 

TAAGCCATGT GGCTCATGTT ACTATTACAA CGAAAACAGC TGATGGAAAG TGTGCATACA 42 0 

GAGCAAGTTA CTCAGATGGA AAACTGAAAG CCCCTCCTAA ACCATGTGCT GGCAATCAAG 480 

GGACCCAGAT CACGGTGGAG GACCnTTTT ACAACATAGC CACGAGGAGA AAAGCTTTAA 54 0 

AAAATCCAAG TGAAGAATAT GGGAAAATTT TGGAAGTTGT TGGCAGGTAT TCAGTACACA 600 

ATGCAGGCAT TAGTTTCTCA GTTAAAAAAC AAGGAGAGAC AGTAGCTGAT GTTAGGACAC 660 

TACCCAATGC CTCAACCGTG GACAATATTC GCTCCGTCTT GGGAAATGCT GTTAGTCGAG 72 0 

AACTGATAGA AATTGGATGT GAGGATAAAA CCCTAGCCTT CAAAATGAAT GGTTACATAT 7 80 

CCAATGCAAA CTACTCAGTG AAGAAGTGCA TCTTCTTACr CTTCATCAAC CATCGTCTGG 840 

TAGAATCAAC TTCCTTGAGA AAAGCCATAG AAACAGTGTA TGCAGCCTAT TTGCCAAAAA 900 

ACACACACCC ATTCCTGTAC CrCAGTTTAG AAATCAGTCC CCAGAATGTG GATGTTAATG 960 

TGAACCCCAC AAAGCATGAA GTTCACTTCC TGCACGAGGA GAGCATCCTG GAGCGGGTGC 1020 

AGCAGCACAT CGAGAGCAAG CTCCTGGGCT CCAATTCCTC CAGGATGTAC TTCACCCAGA 1080 

CTTTGCTACC AGGACTTGCT GGCCCCTCTG GGGAGATGGT TAAATCCACA ACAAGTCTCA 114 0 

CCTCGTCTTC TACTTCTGGA AGTAGTGATA AGGTCTATGC CCACCAGATG GTTCGTACAG 12 0 0 

ATTCCCGGGA ACAGAAGCIT GATGCATTTC TGCAGCCTCT GAGCAAACCC CTGTCCAGTC 1260 

AGCCCCAGGC CATTGTCACA GAGGATAAGA CAGATATTTC TAGTGGCAGG GCTAGGCAGC 132 0 

AAGATGAGGA GATGCTTGAA CTCCCAGCCC CTGCTGAAGT GGCTGCCAAA AATCAGAGCT 1380 

TGGAGGGGGA TACAACAAAG GGGACTTCAG AAATGTCAGA GAAGAGAGGA CCTACTTCCA 144 0 

GCAACCCCAG AAAGAGACAT CGGGAAGATT CTGATCTCCA AATCCTCGAA GATGATTCCC 1500 

GAAAGGAAAT GACTGCAGCT TGTACCCCCC GGAGAAGGAT CATTAACCTC ACTAGTCTTT 1560 

TGAGTCTCCA GGAAGAAATT AATGAGCAGG GACATGAGGT TCTCCGGGAG ATGTTGCATA 1620 

ACCACTCCTT CGTGGGCTGT GTGAATCCTC AGTGGGCCTT GGCACAGCAT CAAACCAAGT 1680 

TATAGCTTCr CAACACCACC AAGCTTAGTG AAGAACTGTT CTACCA GATA CTCATTTATG 174 0 

ATTTTGCCAA TTTTGGTGTT CTCAGGTTAT CGGAGCCAGC ACCGCTCITr GACCTTGCCA 1800 

TGCTTCCCTT ACATAGTCCA GAGAGTGGCT GGACAGAGGA AGATGGTCCC AAAGAAGGAC i860 

TTGCTGAATA CATTGTTGAG TTTCTGAAGA AGAAGGCTGA GATGCTTGCA GACrATTTCr 192 0 

CTTTGGAAAT TGATGAGGAA GGGAACCTGA TTGGATTACC CCTTCTGATT GACAACTATG 1980 

TGCCCCCTTT GGAGGGACTG CCTATCTTCA TTCTTCCACT AGCCACTGAG GTGAATTGGG 204 0 

ACGAAGAAAA GGAATGTTTT GAAAGCCTCA GTAAAGAATG CGCTATGTTC TATTCCATCC 2100 

GGAAGCAGTA CATATCTGAG GAGTCGACCC TCTCAGGCCA GCAGAGTGAA GTGCCTGGCT 2160 

CCATTCCAAA CTCCTGGAAG TGGACTGTGG AACACATTGT CTATAAAGCC TTGCGCTCAC 2220 

ACATTCTGCC TCCTAAACAT TCCACAGAAG ATGGAAATAT CCTGCAGCTT GCTAACCTGC 2280 

CTGATCTATA CAAAGTCTTT GAGAGGTGTT AAATATGGTT ATTTATGCAC TGTGGGATGT 2340 

GTTCITCTrr CTCTGTATTC CGATACAAAG TGTTGTACrA AAGTGTGATA TACAA AGTG T 2400 

ACCAACATAA GTGTTGGTAG CACTTAAGAC TTATACTTGC CTTCTGATAG TATTCCTTTA 24 60 

TACACAGTGG ATTGATTATA AATAAATAGA TGTGTCTTAA CATAAAAAAA AAAAAAAAAA 2520 

AAAAA 2^^^ 
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(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 756 AMINO ACIDS 

(B) TYPE: AMINO ACID 

( C) STRANDEDNESS : 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



Met 


Ser 


Phe 


Val 


Ala 


Gly 


Val 


He 


Arg 


Arg 


Leu 


Asp 


Glu 


Thr 


Val 










5 








10 










15 


Val 


Asn 


Arg 


He 


Ala 


Ala 


Gly 


Glu 


Val 


He 


Gin 


Arg 


Pro 


Ala 


Asn 








20 










25 










3 0 


Ala 


He 


Lys 


Glu 


Met 


He 


Glu 


Asn 


Cys 


Leu 


Asp 


Ala 


Lys 


Ser 


Thr 








35 










40 










45 


Ser 


He 


Gin 


val 


He 


val 


Lys 


Glu 


Gly 


Gly 


Leu 


Lys 


Leu 


He 


Gin 










50 








55 










60 


He 


Gin 


Asp 


Asn 


Gly 


Thr 


Gly 


He 


Arg 


Lys 


Glu 


Asp 


Leu 


Asp 


He 








65 










70 










75 


Val 


Cys 


Glu 


Arg 


Phe 


Thr 


Thr 


Ser 


Lys 


Leu 


Gin 


Ser 


Phe 


Glu 


Asp 






80 










85 










90 


Leu 


Ala 


Ser 


He 


ser 
95 


Thr 


Tyr 


Gly 


Phe 


Arg 
100 


Gly 


Glu 


Ala 


Leu 


Ala 
105 


Ser 


He 


Ser 


His 


val 

110 


Ala 


His 


Val 


Thr 


He 
115 


Thr 


Thr 


Lys 


Thr 


Ala 
120 


Asp 


Gly 


Lys 


Cys 


Ala 


Tyr 


Arg 


Ala 


Ser 


Tyr 


Ser 


Asp 


Gly 


Lys 


Leu 


125 










130 










135 


Lys 


Ala 


Pro 


Pro 


Lys 


Pro 


Cys 


Ala 


Gly 


Asn 


Gin 


Gly 


Thr 


Gin 


He 








140 










145 










150 


Thr 


Val 


Glu 


Asp 


Leu 


Phe 


Tyr 


Asn 


He 


Ala 


Thr 


Arg 


Arg 


Lys 


Ala 








155 










160 










165 


Leu 


Lys 


Asn 


Pro 


Ser 


Glu 


Glu 


Tyr 


Gly 


Lys 


He 


Leu 


Glu 


Val 


val 








170 










175 










180 


Gly 


Arg 


Tyr 


Ser 


Val 


His 


Asn 


Ala 


Gly 


He 


Ser 


Phe 


Ser 


Val 


Lys 




185 










190 










195 


Lys 


Gin 


Gly 


Glu 


Thr 


val 


Ala 


Asp 


Val 


Arg 


Thr 


Leu 


Pro 


Asn 


Ala 






200 










o ri c 
^ (Jo 










^ X u 


Ser 


Thr 


val 


Asp 


Asn 


He 


Arg 


Ser 


Val 


Phe 


Gly 


Asn 


Ala 


Val 


Ser 








215 










220 










225 


Arg 


Glu 


Leu 


He 


Glu 


He 


Gly 


Cys 


Glu 


Asp 


Lys 


Thr 


Leu 


Ala 


Phe 








230 










235 










240 


Lys 


Met 


Asn 


Gly 


Tyr 


He 


Ser 


Asn 


Ala 


Asn 


Tyr 


Ser 


val 


Lys 


Lys 






245 










250 










255 


Cys 


He 


Phe 


Leu 


Leu 


Phe 


He 


Asn 


His 


Arg 


Leu 


Val 


Glu 


Ser 


Thr 








260 










265 










270 


Ser 


Leu 


Arg 


Lys 


Ala 


He 


Glu 


Thr 


Val 


Tyr 


Ala 


Ala 


Tyr 


Leu 


Pro 






275 










280 










285 


Lys 


Asn 


Thr 


His 


Pro 


Phe 


Leu 


Tyr 


Leu 


Ser 


Leu 


Glu 


He 


Ser 


Pro 








290 










295 










300 


Gin 


Asn 


Val 


Asp 


Val 


Asn 


Val 


His 


Pro 


Thr 


Lys 


His 


Glu 


Val 


His 
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Phe 


Leu 


His 


Glu 


Ser 


Lys 


Gin 


Thr 


Leu 


Lys 


Ser 


Thr 


Asp 


Lys 


val 


Gin 


Lys 


Leu 


Ser 


Gin 


Pro 


Ser 


Gly 


Arg 


Ala 


Pro 


Ala 


Thr 


Thr 


Lys 


Ser 


Ser 


Asn 


Met 


Val 


Glu 


Pro 


Arg 


Arg 


Glu 


Glu 


He 


His 


Asn 


His 


Ala 


Gin 


His 


Ser 


Glu 


Glu 


Phe 


Gly 


val 


Ala 


Met 


Leu 


Asp 


Gly 


Pro 


Lys 


Lys 


Lys 


Asp 


Glu 


Glu 


Tyr 


Val 


Pro 


Ala 


Thr 


Glu 


Leu 


Ser 


Lys 


He 


Ser 


Glu 





305 




Glu 


Glu 


Ser 




320 




Leu 


Leu 


Gly 




335 




Leu 


Pro 


Gly 




350 




Thr 


Ser 


Leu 




365 




Tyr 


Ala 


His 




380 




Asp 


Ala 


Phe 




395 




Gin 


Ala 


He 




410 




Ala 


Arg 


Gin 




425 




Glu 


Val 


Ala 




440 




Gly 


Thr 


Ser 




455 




Pro 


Arg 


Lys 




470 




Asp 


Asp 


Ser 




485 




Arg 


He 


He 




500 




Asn 


Glu 


Gin 




515 




Ser 


Phe 


Val 




530 




Gin 


Thr 


Lys 




545 




Leu 


Phe 


Tyr 




560 




Leu 


Arg 


Leu 




575 




Ala 


Leu 


Asp 




590 




Lys 


Glu 


Gly 




605 




Ala 


Glu 


Met 




620 




Gly 


Asn 


Leu 




635 




Pro 


Leu 


Glu 




650 




val 


Asn 


Trp 




665 




Glu 


Cys 


Ala 




680 




Glu 


Ser 


Thr 



He 


Leu 


Glu 


ser 


Asn 


Ser 


Leu 


Ala 


Ala 


Thr 


Ser 


Ser 


Gin 


Met 


Val 


Leu 


Gin 


Pro 


Val 


Thr 


Glu 


Gin 


Asp 


Glu 


Ala 


Lys 


Asn 


Glu 


Met 


Ser 


Arg 


His 


Arg 


Arg 


Lys 


Glu 


Asn 


Leu 


Thr 


Gly 


His 


Glu 


Gly 


Cys 


Val 


Leu 


Tyr 


Leu 


Gin 


He 


Leu 


Ser 


Glu 


Pro 


Ser 


Pro 


Glu 


Leu 


Ala 


Glu 


Leu 


Ala 


Asp 


He 


Gly 


Leu 


Gly 


Leu 


Pro 


Asp 


Glu 


Glu 


Met 


Phe 


Tyr 


Leu 


Ser 


Gly 
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310 






Arg 


Val 


Gin 


325 






Ser 


Arg 


Met 


340 






Pro 


Ser 


Gly 


355 






Ser 


Thr 


Ser 


370 






Arg 


Thr 


Asp 


385 






Leu 


Ser 


Lys 


400 






Asp 


Lys 


Thr 


415 






Glu 


Met 


Leu 


430 






Gin 


Ser 


Leu 


445 






Glu 


Lys 


Arg 


460 






Glu 


Asp 


Ser 


475 






Met 


Thr 


Ala 


490 






Ser 


Val 


Leu 


505 






Val 


Leu 


Arg 


520 






Asn 


Pro 


Gin 


535 






Leu 


Asn 


Thr 


550 






He 


Tyr 


Asp 


565 






Ala 


Pro 


Leu 


580 






Ser 


Gly 


Trp 


595 






Tyr 


He 


Val 


610 






Tyr 


Phe 


Ser 


625 






Pro 


Leu 


Leu 


640 






He 


Phe 


He 


655 






Lys 


Glu 


Cys 


670 






Ser 


He 


Arg 


685 






Gin 


Gin 


Ser 













315 


Gin 


His 


He 






330 


Tyr 


Phe 


Thr 






345 


Glu 


Met 


val 






360 


Gly 


Ser 


Ser 






375 


Ser 


Arg 


Glu 






390 


Pro 


Leu 


Ser 






405 


Asp 


He 


Ser 






420 


Glu 


Leu 


Pro 






435 


Glu 


Gly 


Asp 






450 


Gly 


Pro 


Thr 






465 


Asp 


Val 


Glu 






480 


Ala 


Cys 


Thr 






495 


Ser 


Leu 


Gin 






510 


Glu 


Met 


Leu 






525 


Trp 


Ala 


Leu 






540 


Thr 


Lys 


Leu 






555 


Phe 


Ala 


Asn 






570 


Phe 


Asp 


Leu 






585 


Thr 


Glu 


Glu 






600 


Glu 


Phe 


Leu 






615 


Leu 


Glu 


He 






630 


Thr 


Asp 


Asn 






645 


Leu 


Arg 


Leu 






660 


Phe 


Glu 


Ser 






675 


Lys 


Gin 


Tyr 






690 


Glu 


Val 


Pro 



695 700 705 

Gly Ser lie Pro Asn Ser Trp Lys Trp Thr Val Glu His lie Val 

710 715 720 

Tvr Lvs Ala Leu Arg Ser His lie Leu Pro Pro Lys His Phe Thr 
^ 725 730 735 

Glu Asp Gly Asn He Leu Gin Leu Ala Asn Leu Pro Asp Leu Tyr 

740 745 750 

Lys Val Phe Glu Arg Cys 

755 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 3063 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: CDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:3: 

GGCACGAGTG GCTGCTTGCG GCTAGTGGAT GGTAATTGCC TGCCTCGCGC TAGCAGCAAG 60 

CTGCTCTGTT AAAAGCGAAA ATGAAACAAT TGCCTGCGGC AACAGTTCGA CTCCTTTCAA 120 

GrrCTCAGAT CATCACTTCG GTGGTCAGTG TTGTAAAAGA GCTTATTGAA AACTCCTTGG 180 

ATGCTGGTGC CACAAGCGTA GATGTTAAAC TGGAGAACTA TGGATTTGAT AAAATTGAGG 24 0 

TGCGAGATAA CGGGGAGGGT ATCAAGGCTG TTGATGCACC TGTAATGGCA AT GAAG TACT 300 

ACACCTCAAA AATAAATAGT CATGAAGATC TTGAAAATTT GACAACTTAC GGTTTTCGTG 360 

GAGAAGCCTT GGGGTCAATT TGTTGTATAG CTGAGGTTTT AATTACAACA AGAACGGCTG 420 

CTGATAATTT TAGCACCCAG TATGTTTTAG ATG GCAG TGG CCACATACTT TCTCAGAAAC 4 80 

CTTCACATCT TGGTCAAGGT ACAACTGTAA CTGCTTTAAG ATTAnTAAG AATCTACCTG 54 0 

TAAGAAAGCA GTTTTACTCA ACTGCAAAAA AATGTAAAGA TGAAATAAAA AAGATCCAAG 600 

ATCrCCrCAT GAGCTTTGGT ATCCTTAAAC CTGACTTAAG GATTGTCTTT GTACATAACA 660 

AGGCAGTTAT TTGGCAGAAA AGCAGAGTAT CAGATCACAA GATGGCTCTC ATGTCAGTTC 72 0 

TGGGGACTGC TGTTATGAAC AATATGGAAT CCTTTCAGTA CC ACTC TGAA GAA TCTC AGA 780 

TTTATCTCAG TGGATTTCTT CCAAAGTGTG ATGCAGACCA CTCTTTCACr AGTCTTTCAA 84 0 

CACCAGAAAG AAGTTTCATC TTCATAAACA GTCGACCAGT ACATCAAAAA GATATCTTAA 900 

AGTTAATCCG ACATCATTAC AATCTGAAAT GCCTAAAGGA ATCTACTCGT TTGTATCCTG 960 

TTTTCTTTCr GAAAATCGAT GTTCCTACAG CTGATGTTGA TGTAAATTTA ACACCAGATA 1020 

AAAGCCAAGT ATTATTACAA AATAAGGAAT CTGTTTTAAT TGCTCTTGAA AATCTGATGA 1080 

CGACTTGTTA TGGACCATTA CCTAGTACAA ATTCTTATGA AAAT AATAAA ACAGATGTTT 114 0 

CCGCAGCTGA CATCGTTCTT AGTAAAACAG CAGAAACAGA TGTGCTTTTT AATAAAGTGG 1200 

AATCATCTGG AAAGAATTAT TCAAATGTTG ATACTTCAGT CATTCCATTC CAAAATGATA 1260 

TGCATAATGA TGAATCTGGA AAAAACACTG ATGATTGTTT AAATCACCAG ATAAGTATTG 132 0 

GTGACTTTGG TTATGGTCAT TGTAGTAGTG AAATTTCTAA CATTGATAAA AACACTAAGA 1380 

ATGCAITTCA GGACATTTCA ATGAGTAATG TATCATGGGA GAACTCTCAG ACGGAATATA 144 0 

GTAAAACTTG TnTATAAGT TCCGTTAAGC ACACCCAGTC AGAAAATGGC AATAAAGACC 1500 

ATATAGATGA GAGTGGGGAA AATGAGGAAG AAGCAGGTCT TGAAAACTCT TCGGAAATTT 1560 

CTGCAGATGA GTGGAGCAGG GGAAATATAC TTAAAAATTC AGTGGGAGAG AATATTGAAC 1620 

CTGTGAAAAT TTTAGTGCCT GAAAAAAGTT TACCATGTAA AGTAAGTAAT AATAATTATC 1680 

CAATCCCTGA ACAAATGAAT CTTAATGAAG ATTCATGTAA CAAAAAATCA AATGTAATAG 174 0 

ATAATAAATC TGGAAAAGTT ACAGCTTATG ATTTACTTAG CAATCGAGTA ATCAAGAAAC 1800 

CCATGTCAGC AACTGCTCTT TTTGTTCAAG ATCATCGTCC TCAGTTrCTC ATAGAAAATC i860 

CTAAGACTAG TTTAGAGGAT GCAACACTAC AAATTGAAGA ACTGTGGAAG ACATTGAGTG 192 0 

AAGAGGAAAA ACTGAAATAT GAAGAGAAGG CTACTAAAGA CTTGGNACGA TACAATAGTC 1980 

AAATGAAGAG AGCCATTGAA CAGGAGTCAC AAATGTCACT AAAAGATGGC AGAAAAAAGA 204 0 

TAAAACCCAC CAGCGCATGG AATTTGGCCC AGAAGCACAA GTTAAAAACC TCATTATCTA 2100 

ATCAACCANA ACTTGATGAA CTCCTTCAGT CCCAAATTGA AAAAAGAAGG AGTCAAAATA 2160 

TTAAAATGGT ACAGATCCCC mTCTATGA AAAACTTAAA AATAAATTTT AAGAAACAAA 2220 
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ACAAAGTTGA CTTAGAAGAG AAGGATGAAC CTTGCTTGAT CCACAATCTC AGGTTTCCTG 22 8 0 

ATGCATGGCT AATGACATCC AAAACAGAGG TAATGTTArT AAATCCATAT AGAGTAGAAG 234 0 

AAGCCCTGCT ATTTAAAAGA CTTCTTGAGA ATCATAAACT TCCTGCAGAG C CACT GGAAA 24 0 0 

AGCCAATTAT GTTAACAGAG AGTCTTTITA ATGGATCTCA TTATTTAGAC GTTTTATATA 24 60 

AAATGACAGC AGATGACCAA AGATACAGTG GATCAACTTA CCTGTCTGAT CCTCGTCTTA 252 0 

CAGCGAATGG TTTCAAGATA AAATTGATAC CAGGAGTTTC AATTACTGAA AATTACTTGG 25 8 0 

AAATAGAAGG AATGGCTAAT TGTCTCCCAT TCTATGGAGT AGCAGATTTA AAAGAAATTC 264 0 

TTAATGCTAT ATTAAACAGA AATGCAAAGG AAGTTTATGA ATGTAGACCT CGCAAAGTGA 27 00 

TAAGTTAnT AGAGGGAGAA GCAGTGCGTC TATCCAGACA ATTACCCATG TACTTATCAA 2760 

AAGAGGACAT CCAAGACATT ATCTACAGAA TGAAGCACCA GTTTGGAAAT GAAATTAAAG 2 82 0 

AGTGTGTTCA TGGTCGCCCA TmTTCATC ATTTAACCTA TCTTCCAGAA ACTACATGAT 28 8 0 

TAAATATGTT TAAGAAGATT AGTTACCATT GAAATTGGTT C TGTCAT AAA ACAGCATGAG 2 94 0 

TCTGGTTrrA AATTATCnT GTATTATGTG TCACATGGTT ATTnTTAAA TGAGGATTCA 3000 

CTGACTTGTT TITATATTGA AAAAAGTTCC ACGTATTGTA GAAAACGTAA ATAAACTAAT 3 060 
AAC 



(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 931 BASE PAIRS 

(B) TYPE: AMINO ACID 

(C) STRANDEDNESS : 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN (XI) 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



Met 


Lys 


Gin 


Leu 


Pro 


Ala 


Ala 


Thr 


Val 


Arg 


Leu 


Leu 


Ser 


Ser 


Ser 








5 










10 










15 


Gin 


He 


He 


Thr 


Ser 


Val 


val 


Ser 


Val 


val 


Lys 


Glu 


Leu 


He 


Glu 










20 










25 










30 


Asn 


Ser 


Leu 


Asp 


Ala 


Gly 


Ala 


Thr 


Ser 


val 


Asp val 


Lys 


Leu 


Glu 








35 








40 










45 


Asn 


Tyr 


Gly 


Phe 


Asp 


Lys 


He 


Glu 


val 


Arg 


Asp Asn Gly 


Glu 


Gly 






50 










55 










60 


He 


Lys 


Ala 


val 


Asp 


Ala 


Pro 


val 


Met 


Ala 


Met 


Lys 


Tyr 


Tyr 


Thr 








65 










70 










75 


Ser 


Lys 


He 


Asn 


Ser 


His 


Gly 


Asp 


Leu 


Glu 


Asn 


Leu 


Thr 


Thr 


Tyr 








80 










85 










90 


Gly 


Phe 


Arg 


Gly 


Glu 


Ala 


Leu 


Gly 


Ser 


He 


Cys 


Cys 


He 


Ala 


Glu 






95 










100 










105 


val 


Leu 


He 


Thr 


Thr 


Arg 


Thr 


Ala 


Ala 


Asp 


Asn 


Phe 


Ser 


Thr 


Gin 










110 










115 










120 


Tyr 


Val 


Leu 


Asp 


Gly 


Ser 


Gly 


His 


He 


Leu 


Ser 


Gin 


Lys 


Pro 


Ser 






125 










130 










135 


His 


Leu 


Gly 


Gin 


Gly 


Thr 


Thr 


Val 


Thr 


Ala 


Leu 


Arg 


Leu 


Phe 


Lys 








140 










145 










150 


Asn 


Leu 


Pro 


Val 


Arg 


Lys 


Gin 


Phe 


Tyr 


Ser 


Thr 


Ala 


Lys 


Lys 


Cys 










155 










160 










165 


Lys 


Asp 


Glu 


He 


Lys 


Lys 


He 


Gin 


Asp 


Leu 


Leu 


Met 


Ser 


Phe 


Gly 






170 










175 










180 


He 


Leu 


Lys 


Pro 


Asp 


Leu 


Arg 


He 


val 


Phe 


Val 


His 


Asn 


Lys 


Ala 








185 










190 










195 


val 


He 


Trp 


Gin 


Lys 


Ser 


Arg 


val 


Ser 


Asp 


His 


Lys 


Met 


Ala 


Leu 



3063 
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200 






Met 


Ser 


val 


Leu 


Gly 
215 


Thr 


Ala 


Gin 


Tyr 


His 


Ser 


Glu 
230 


Glu 


Ser 


Pro 


Lys 


Cys 


Asp 


Ala 
245 


Asp 


His 


Glu 


Arg 


Ser 


Phe 


He 


Phe 


He 








260 






Asp 


He 


Leu 


Lys 


Leu 
275 


He 


Arg 


Lys 


Glu 


Ser 


Thr 


Arg 
290 


Leu 


Tyr 


Val 


Pro 


Thr 


Ala 


Asp 

305 


Val 


Asp 


Gin 


Val 


Leu 


Leu 


Gin 
320 


Asn 


Lys 


Asn 


Leu 


Met 


Thr 


Thr 

335 


Cys 


Tyr 


Tyr 


Glu 


Asn 


Asn 


Lys 
350 


Thr 


Asp 


Ser 


Lys 


Thr 


Ala 


Glu 
365 


Thr 


Asp 


Ser 


Gly 


Lys 


Asn 


Tyr 
380 


Ser 


Asn 


Gin 


Asn 


Asp 


Met 


His 
395 


Asn 


Asp 


Cys 


Leu 


Asn 


His 


Gin 

410 


He 


Ser 


Cys 


Ser 


ser 


Glu 


He 


Ser 


Asn 








425 






Phe 


Gin 


Asp 


He 


Ser 
440 


Met 


Ser 


Thr 


Glu 


Tyr 


Ser 


Lys 
455 


Thr 


Cys 


Gin 


Ser 


Glu 


Asn 


Gly 
470 


Asn 


Lys 


Asn 


Glu 


Glu 


Glu 


Ala 
485 


Gly 


Leu 


Asp 


Glu 


Trp 


Ser 


Arg 

500 


Gly 


Asn 


Asn 


He 


Glu 


Pro 


Val 

515 


Lys 


He 


Cys 


Lys 


Val 


Ser 


Asn 
530 


Asn 


Asn 


Leu 


Asn 


Glu 


Asp 


Ser 
545 


Cys 


Asn 


Lys 


Ser 


Gly 


Lys 


Val 
560 


Thr 


Ala 


He 


Lys 


Lys 


Pro 


Met 

575 


Ser 


Ala 


Arg 


Pro 


Gin 


Phe 


Leu 


He 


Glu 























205 










210 


Val 


Met 


Asn 


Asn 


Met 


Glu 


Ser 


Phe 






220 










225 


Gin 


He 


Tyr 


Leu 


Ser 


Gly 


Phe 


Leu 






235 










240 


Ser 


Phe 


Thr 


Ser 


Leu 


Ser 


Thr 


Pro 






250 










255 


Asn 


Ser 


Arg 


Pro 


Val 


His 


Gin 


Lys 






265 










270 


His 


His 


Tyr 


Asn 


Leu 


Lys 


Cys 


Leu 






280 










285 


Pro 


Val 


Phe 


Phe 


Leu 


Lys 


He 


Asp 






295 










300 


Val 


Asn 


Leu 


Thr 


Pro 


Asp 


Lys 


Ser 






310 










315 


Glu 


Ser 


Val 


Leu 


He 


Ala 


Leu 


Glu 






325 










330 


Gly 


Pro 


Leu 


Pro 


Ser 


Thr 


Asn 


Ser 






340 










345 


Val 


Ser 


Ala 


Ala 


Asp 


He 


Val 


Leu 






355 










360 


Val 


Leu 


Phe 


Asn 


Lys 


Val 


Glu 


Ser 






370 










375 


Val 


Asp 


Thr 


Ser 


Val 


He 


Pro 


Phe 




385 










390 


Glu 


Ser 


Gly 


Lys 


Asn 


Thr 


Asp 


Asp 






400 










405 


He 


Gly 


Asp 


Phe 


Gly Tyr 


Gly His 






415 










420 


He 


Asp 


Lys 


Asn 


Thr 


Lys 


Asn 


Ala 






430 










435 


Asn 


Val 


Ser 


Trp 


Glu 


Asn 


Ser 


Gin 






445 










450 


Phe 


He 


Ser 


Ser 


Val 


Lys 


His 


Thr 






460 










465 


Asp 


His 


He 


Asp 


Glu 


Ser 


Gly 


Glu 






475 










480 


Glu 


Asn 


Ser 


Ser 


Glu 


He 


Ser 


Ala 






490 










495 


He 


Leu 


Lys 


Asn 


Ser 


Val 


Gly Glu 






505 










510 


Leu 


Val 


Pro 


Glu 


Lys 


Ser 


Leu 


Pro 






520 










525 


Tyr 


Pro 


He 


Pro 


Glu 


Gin 


Met 


Asn 






535 










540 


Lys 


Lys 


Ser 


Asn 


val 


He 


Asp 


Asn 






550 










555 


Tyr 


Asp 


Leu 


Leu 


Ser 


Asn 


Arg 


Val 






565 










570 


Ser 


Ala 


Leu 


Phe 


Val 


Gin Asp 


His 






560 










585 


Asn 


Pro 


Lys 


Thr 


Ser 


Leu 


Glu 


Asp 
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590 










595 










600 


Ala 


Thr 


Leu 


Gin 


He 


Glu 


Glu 


Leu 


Trp 


Lys 


Thr 


Leu 


Ser 


Glu 


Glu 








605 










610 










615 


Glu 


Lys 


Leu 


Lys 


Tyr 


Glu 


Glu 


Lys 


Ala 


Thr 


Lys 


Asp 


Leu 


Xaa 


Arg 






620 










625 










630 


Tyr 


Asn 


Ser 


Gin 


Met 


Lys 


Arg 


Ala 


He 


Glu 


Gin 


Glu 


Ser 


Gin 


Met 








635 










640 










645 


Ser 


Leu 


Lys 


Asp 


Gly 


Arg 


Lys 


Lys 


He 


Lys 


Pro 


Thr 


Ser 


Ala 


Trp 








650 










655 










660 


Asn 


Leu 


Ala 


Gin 


Lys 
665 


His 


Lys 


Leu 


Lys 


Thr 
670 


Ser 


Leu 


Ser 


Asn 


Gin 
675 


Pro 


xaa 


Leu 


Asp 


Glu 


Leu 


Leu 


Gin 


Ser 


Gin 


He 


Glu 


Lys 


Arg 


Arg 








680 










685 










690 


Ser 


Gin 


Asn 


He 


Lys 
695 


Met 


Val 


Gin 


He 


Pro 

700 


Phe 


Ser 


Met 


Lys 


Asn 

705 


Leu 


Lys 


He 


Asn 


Phe 


Lys 


Lys 


Gin 


Asn 


Lys 


Val 


Asp 


Leu 


Glu 


Glu 








710 










715 










720 


Lys 


Asp 


Glu 


Pro 


Cys 


Leu 


He 


His 


Asn 


Leu 


Arg 


Phe 


Pro 


Asp 


Ala 






725 










730 










735 


Trp 


Leu 


Met 


Thr 


Ser 


Lys 


Thr 


Glu 


Val 


Met 


Leu 


Leu 


Asn 


Pro 


Tyr 








740 










745 










750 


Arg 


Val 


Glu 


Glu 


Ala 


Leu 


Leu 


Phe 


Lys 


Arg 


Leu 


Leu 


Glu 


Asn 


His 








755 










760 










765 


Lys 


Leu 


Pro 


Ala 


Glu 


Pro 


Leu 


Glu 


Lys 


Pro 


He 


Met 


Leu 


Thr 


Glu 








770 










775 










780 


Ser 


Leu 


Phe 


Asn 


Gly 
785 


Ser 


His 


Tyr 


Leu 


Asp 
790 


Val 


Leu 


Tyr 


Lys 


Met 
795 


Thr 


Ala 


Asp 


Asp 


Gin 


Arg 


Tyr 


Ser 


Gly 


Ser 


Thr 


Tyr 


Leu 


Ser 


Asp 






800 










805 










810 


Pro 


Arg 


Leu 


Thr 


Ala 


Asn 


Gly 


Phe 


Lys 


He 


Lys 


Leu 


He 


Pro 


Gly 








815 










820 










825 


Val 


Ser 


He 


Thr 


Glu 


Asn 


Tyr 


Leu 


Glu 


He 


Glu 


Gly 


Met 


Ala 


Asn 










830 








835 










840 


Cvs 


Leu 


Pro 


Phe 


Tyr 


Gly 


val 


Ala 


Asp 


Leu 


Lys 


Glu 


He 


Leu 


Asn 








845 










850 










855 


Ala 


He 


Leu 


Asn 


Arg 
860 


Asn 


Ala 


Lys 


Glu 


val 

865 


Tyr 


Glu 


Cys 


Arg 


Pro 
870 


Arg 


Lys 


Val 


He 


Ser 


Tyr 


Leu 


Glu 


Gly 


Glu 


Ala 


Val 


Arg 


Leu 


Ser 






875 










880 










885 


Arg 


Gin 


Leu 


Pro 


Met 


Tyr 


Leu 


Ser 


Lys 


Glu 


Asp 


He 


Gin 


Asp 


He 








890 










895 










900 


He 


Tyr 


Arg 


Met 


Lys 


His 


Gin 


Phe 


Gly 


Asn 


Glu 


He 


Lys 


Glu 


Cys 






905 










910 










915 


val 


His 


Gly 


Arg 


Pro 


Phe 


Phe 


His 


His 


Leu 


Thr 


Tyr 


Leu 


Pro 


Glu 






920 










925 










930 



Thr 



(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 771 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNES S : S INGLE 
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(D) TOPOLOGY: LINKAR 
(ii) MOLECULE TYPE: CDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:5: 



CGAGGCGGAT CGGGTGTTGC ATCCATGGAG CGAGCTGAGA GCTCGAGTAC AGAACCTGCT 6 0 

AAGGCCATCA AACCTATTGA TCGGAAGTCA GTCCATCAGA TTTGCTCTGG GCAGGTGGTA 12 0 

CTGAGTCTAA GCACTGCGGT AAAGGAGTTA GTAGAAAACA GTCTGGATGC TGGTGCCACT 18 0 

AATATTGATC TAAAGCTTAA GGACTATGGA GTGGATCTTA TTGAAGTTTC AGACAATGGA 24 0 

TGTGGGGTAG AAGAAGAAAA CTTCGAAGGC T TAACT CTGA AACATCACAC ATCTAAGATT 300 

CAAGAGTTTG CCGACCTAAC TCAGGTTGAA ACTTTTGGCT TTCGGGGGGA AGCTCTGAGC 360 

TCACTTTGTG CACTGAGCGA TGTCACCATT TCTACCTGCC ACGCATCGGC GAAGGTTGGA 420 

ACTCGACTGA TGTTTGATCA CAATGGGAAA ATTATCCAGA AAACCCCCTA CCCCCGCCCC 4 80 

AGAGGGACCA CAGTCAGCGT GCAGCAGTTA TTTTCCACAC TACCTGTGCG CCATAAGGAA 54 0 

TTTCAAAGGA ATATTAAGAA GGAGTATGCC AAAATGGTCC AGGTCTTACA TGCATACTGT 60 0 

ATCATTTCAG CAGGCATCCG TGTAAGTTGC ACCAATCAGC TTGGACAAGG AAAACGACAG 660 

CCTGTGGTAT GCACAGGTGG AAGCCCCAGC ATAAAGGAAA ATATCGGCTC TGTGTTTGGG 72 0 

CAGAAGCAGT TGCAAAGCCT CATTCCTTTT GTTCAGCTGC CCCCTA GTGA CTCCGTGTGT 780 

GAAGAGTACG GTTTGAGCTG TTCGGATGCT CTGCATAATC TTTTTTACAT CTCAGGTTTC 84 0 

ATTTCACAAT GCACGCATGG AGTTGGAAGG AGTTCAACAG ACAGACAGTT TTTCTTTATC 900 

AACCGGCGGC CTTGTGACCC AGCAAAGGTC TGCAGACTCG TGAATGAGGT CTACCACATG 960 

TATAATCGAC ACCAGTATCC ATTTGTTGTT CTTAACATTT CTGTTGATTC AGAATGCGTT 1020 

GATATCAATG TTACTCCAGA TAAAAGGCAA ATTTTGCTAC AAGAGGAAAA GCTTTTGTTG 1080 

GCAGTrTTAA AGACCTTCTTT GATAGGAATG TTTGATAGTG ATGTCAACAA GCTAAATGTC 114 0 

AGTCAGCAGC CACTGCTGGA TGTTGAAGGT AACTTAATAA AAATGCATGC AGCGGATTTG 1200 

GAAAAGCCCA TGGTAGAAAA GCAGGATCAA TCCCCTTCAT TAAGGACTGG AGAAGAAAAA 1260 

AAAGACGTGT CCATTTCCAG ACTGCGAGAG GCCTTTTCTC TTCGTCACAC AACAGAGAAC 132 0 

AAGCCTCACA GCCCAAAGAC TCCAGAACCA AGAAGGAGCC CTCTAGGACA GAAAAGGGGT 1380 

ATGCTGTCTT CTAGCACTTC AGGTGCCATC TCTGACAAAG GCGTCCTGAG ACCTCAGAAA 144 0 

GAGGCAGTGA GTTCCAGTCA CGGACCCAGT GACCCTACGG ACAGAGCGGA GGTGGAGAAG 1500 

GACTCGGGGC ACGGCAGCAC TTCCGTGGAT TCTGAGGGGT TCAGCATCCC AGACACGGGC 1560 

AGTCACTGCA GCAGCGAGTA TGCGGCCAGC TCCCCAGGGG ACAGGGGCTC GCAGGAACAT 1620 

GTGGACTCTC AGGAGAAAGC GCCTGAAACT GACGACTCTT TTTCAGATGT GGACTGCCAT 1680 

TCAAACCAGG AAGATACCGG ATGTAAATTT CGAGTTTTGC CTCAGCCAAC TAATCTCGCA 174 0 

ACCCCAAACA CAAAGCGTTT TAAAAAAGAA GAAATTCTTT CCAGTTCTGA CATTTGTCAA 1800 

AAGTTAGTAA ATACTCAGGA CATGTCAGCC TCTCAGGTTG ATGTAGCTGT GAAAATTAAT 1860 

AAGAAAGTTG TGCCCCTGGA CTTTTCTATG AGTTCnTAG CTAAACGAAT AAAGCAGTTA 1920 

CATCATGAAG CACAGCAAAG TGAAGGGGAA CAGAATTACA GGAAGTTTAG GGCAAAGATT 1980 

TGTCCTGGAG AAAATCAAGC AGCCGAAGAT GAACTAAGAA AAGAGATAAG TAAAACGATG 204 0 

TTTGCAGAAA TGGAAATCAT TGGTCAGTTT AACCTGGGAT TTATAATAAC CACACTGAAT 210 0 

GAGGATATCT TCATAGTGGA CCAGCATGCC ACGGACGAGA AGTATAACTT CGAGATGCTG 2160 

CAGCAGCACA CCGTGCTCCA GGGGCAGACG CTCATAGCAC CTCAGACTCT CAACTTAACT 2220 

GCTGTTAATG AAGCTGTTCT GATAGAAAAT CTGGAAATAT TTAGAAAGAA TGGCTTTGAT 2280 

TTTCTTATCG ATGAAAATGC TCCAGTCACT GAAAGGGCTA AACTGAnTC CTTGCCAACT 234 0 

AGTAAAAACT GGACCTTCGG ACCCCAGGAC GTCGATGAAC TGATCTTCAT GOTGAGCGAC 24 00 

AGCCCTGGGG TCATGTGCCG GCCTTCCCGA GTCAAGCAGA TGTTTGCCTC CAGAGCCTGC 2460 

CGGAAGTCGG TGATGATTGG GACTGCTCTT AACACAAGCG AGATGAAGAA ACTGATCACC 2520 

CACATGGGGG AGATGGACCA CCCCTGGAAC TGTCCCCATG GAAGGCCAAC CATGAGACAC 258 0 

ATCGCCAACC TGGGTGTCAT TTCTCAGAAC TGACCGTAGT CACTGTATGG AAT AATT GGT 264 0 

TTTATCGCAG ATTTTTATGT TTTGAAAGAC AGAGTCTTCA CTAACCTTTT TTGTTTTAAA 2700 

ATGAAACCTG CTACTTAAAA AAAATACACA TCACACCCAT TTAAAAGTGA TCTTGAGAAC 2 760 

CTTTTCAAAC C 2771 



(2) INFORMATION FOR SEQ ID NO : 6 : 
(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 862 AMINO ACIDS 

(B) TYPE: AMINO ACID 

(C) STRANDEDNESS : 
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(D) TOPOLOGY: LINEAR 



(ii) MOLECULE TYPE: PROTEIN 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 



Met 


Glu 


Arg 


Ala 


Glu 


Ser 


Ser 


Ser 


Tnr 


Glu 


Pro 


Ala 


Lys 


AX a 










5 










10 










15 


Lys 


Pro 


He 


Asp 


Arg 


Lys 


Ser 


val 


His 


Gin 


He 


Cys 


Ser 


Gly 


Gin 








20 










25 










3 0 


val 


Val 


Leu 


Ser 


Leu 
35 


Ser 


Thr 


Ala 


Val 


Lys 
40 


Glu 


Leu 


Val 


Glu 


Asn 
45 


Ser 


Leu 


Asp 


Ala 


Gly 


Ala 


Thr 


Asn 


He 


Asp 


Leu 


Lys 


T nil 

Leu 


Lys 


Asp 








50 










55 










60 


Tyr 


Gly 


val 


Asp 


Leu 


He 


Glu 


Val 


Ser 


Asp 


Asn 


Gly 


Cys 


Gly 


val 




65 










70 










75 


Glu 


Glu 


Glu 


Asn 


Phe 

80 


Glu 


Gly 


Leu 


Thr 


Leu 
85 


Lys 


HIS 


HIS 


inr 


C Q ^ 
90 


Lys 


He 


Gin 


Glu 


Phe 


Ala 


Asp 


Leu 


Tnr 


Gin 


Val 


Glu 


inr 


rTie 










95 








100 










IUd 


Phe 


Arg 


Gly 


Glu 


Ala 


Leu 


Ser 


Ser 


Leu 


Cys 


Ala 


Leu 


ser 


Asp 


vax 






110 










115 










12 0 


Thr 


He 


Ser 


Thr 


Cys 

125 


His 


Ala 


Ser 


Ala 


Lys 
130 


val 


Gly 


Tnr 


Arg 


Lieu 
135 


Met 


Phe 


Asp 


His 


Asn 


Gly 


Lys 


He 


He 


Gin 


Lys 


Thr 


Pro 


Tyr 


Pro 








140 










145 










150 


Arg 


Pro 


Arg 


Gly 


Thr 


Thr 


Val 


Ser 


Val 


Gin 


Gin 


Leu 


Pne 


C n V 

Ser 


Tnr 






155 










160 










165 


Leu 


Pro 


Val 


Arg 


His 


Lys 


Glu 


Phe 


Gin 


Arg 


Asn 


He 


Lys 


Lys 


Glu 








170 








175 










180 


Tyr 


Ala 


Lys 


Met 


val 


Gin 


val 


Leu 


His 


Ala 


Tyr 


Cys 


He 


He 


Ser 






185 










190 










195 


Ala 


Gly 


He 


Arg 


Val 


Ser 


Cys 


Thr 


Asn 


Gin 


Leu 


Gly 


Gin 


Gly 


Lys 






200 










205 










210 


Arg 


Gin 


Leu 


Trp 


Tyr 


Ala 


Gin 


val 


Glu 


Ala 


Pro 


Ala 


He 


Lys 


Glu 






215 










220 










225 


Asn 


He 


Gly 


Ser 


Val 


Phe 


Gly 


Gin 


Lys 


Gin 


Leu 


Gin 


Ser 


Leu 


He 








230 










235 










240 


Pro 


Phe 


Val 


Gin 


Leu 
245 


Pro 


Pro 


Ser 


Asp 


Ser 
250 


Val 


Cys 


Glu 


Glu 


Tyr 
255 


Gly 


Leu 


Ser 


Cys 


Ser 


Asp 


Ala 


Leu 


His 


Asn 


Leu 


Phe 


Tyr 


He 


Ser 






260 










265 










270 


Gly 


Phe 


He 


Ser 


Gin 


Cys 


Thr 


His 


Gly 


Val 


Gly 


Arg 


Ser 


Ser 


m V« 

Tnr 








275 










2B0 










285 


Asp 


Arg 


Gin 


Phe 


Phe 


Phe 


He 


Asn 


Arg 


Arg 


Pro 


Cys 


Asp 


Pro 


Ala 






290 










295 










300 


Lys 


val 


Cys 


Arg 


Leu 


Val 


Asn 


Glu 


Val 


Tyr 


His 


Met 


Tyr 


Asn 


Arg 




305 










310 










315 


His 


Gin 


Tyr 


Pro 


Phe 


val 


val 


Leu 


Asn 


He 


Ser 


val 


Asp 


Ser 


Glu 








320 










325 










330 


Cys 


Val 


Asp 


He 


Asn 


val 


Thr 


Pro 


Asp 


Lys 


Arg 


Gin 


He 


Leu 


Leu 






335 










340 










345 
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Gin 


Glu 


Glu 


Lys 


Leu 

350 


Leu 


Leu 


Gly 


Met 


Phe 


Asp 


Ser 


Asp 


val 








365 






Pro 


Leu 


Leu 


Asp 


Val 

380 


Glu 


Gly 


Asp 


Leu 


Glu 


Lys 


Pro 
395 


Met 


Val 


Leu 


Arg 


Thr 


Gly 


Glu 
410 


Glu 


Lys 


Arg 


Glu 


Ala 


Phe 


Ser 
425 


Leu 


Arg 


Ser 


Pro 


Lys 


Thr 


Pro 

440 


Glu 


Pro 


Arg Gly 


Met 


Leu 


Ser 


Ser 


Ser 










455 






Gly 


Val 


Leu 


Arg 


Pro 
470 


Gin 


Lys 


Pro 


Ser 


Asp 


Pro 


Thr 
485 


Asp 


Arg 


His 


Gly 


Ser 


Thr 


Ser 
500 


val 


Asp 


Thr Gly 


Ser 


His 


Cys 


Ser 


Ser 










515 






Asp 


Arg 


Gly 


ser 


Gin 
530 


Glu 


His 


Glu 


Thr 


Asp 


Asp 


Ser 
545 


Phe 


Ser 


Glu 


Asp 


Thr 


Gly 


Cys 
560 


Lys 


Phe 


Leu 


Ala 


Thr 


Pro 


Asn 

575 


Thr 


Lys 


Ser 


Ser 


Ser 


Asp 


He 
590 


Cys 


Pro 


Ser 


Ala 


Ser 


Gin 


Val 
605 


Asp 


Val 


Val 


Pro 


Leu 


Asp 


Phe 
620 


Ser 


Met 


Gin 


Leu 


His 


His 


Glu 
635 


Ala 


Gin 


Arg 


Lys 


Phe 


Arg 


Ala 
650 


Lys 


He 


Glu Asp 


Glu 


Leu 


Arg 


Lys 


Glu 










665 






Met 


Glu 


He 


He 


Gly 
680 


Gin 


Phe 


Leu 


Asn 


Glu 


Asp 


He 

695 


Phe 


He 


Lys 


Tyr 


Asn 


Phe 


Glu 
710 


Met 


Leu 


Gin 


Arg 


Leu 


He 


Ala 

725 


Pro 


Glu 



Ala 


Val 


Leu 
355 


Lys 


Thr 


Ser 


Leu 


He 
360 


Asn 


Lys 


Leu 


Asn 


val 


Ser 


Gin 


Gin 




370 










375 


Asn 


Leu 


He 
385 


Lys 


Met 


His 


Ala 


Ala 
390 


Glu 


Lys 


Gin 
400 


Asp 


Gin 


Ser 


Pro 


Ser 
405 


Lys 


Asp 


Val 
415 


Ser 


He 


Ser 


Arg 


Leu 
420 


His 


Thr 


Thr 
430 


Glu 


Asn 


Lys 


Pro 


His 
435 


Arg 


Arg 


Ser 
445 


Pro 


Leu 


Gly 


Gin 


Lys 
450 


Thr 


Ser 


Gly 
460 


Ala 


He 


Ser 


Asp 


Lys 
465 


Glu 


Ala 


Val 
475 


Ser 


Ser 


Ser 


His 


Gly 

480 


Ala 


Glu 


val 


Glu 


Lys 


Asp 


Ser Gly 






490 










495 


Ser 


Glu 


Gly 
505 


Phe 


Ser 


He 


Pro 


Asp 
510 


Glu 


Tyr 


Ala 
520 


Ala 


Ser 


Ser 


Pro 


Gly 
525 


Val 


Asp 


Ser 


Gin 


Glu 


Lys 


Ala 


Pro 




535 










540 


Asp 


Val 


Asp 

550 


Cys 


His 


Ser 


Asn 


Gin 
555 


Arg 


Val 


Leu 
565 


Pro 


Gin 


Pro 


Thr 


Asn 

570 


Arg 


Phe 


Lys 
580 


Lys 


Glu 


Glu 


He 


Leu 
585 


Gin 


Leu 


Val 
595 


Asn 


Thr 


Gin 


Asp 


Met 
600 


Ala 


Val 


Lys 
610 


He 


Asn 


Lys 


Lys 


val 
615 


Ser 


Ser 


Leu 
625 


Ala 


Lys 


Arg 


He 


Lys 
630 


Gin 


Ser 


Glu 
640 


Gly 


Glu 


Gin 


Asn 


Tyr 
645 


Cys 


Pro 


Gly 
655 


Glu 


Asn 


Gin 


Ala 


Ala 
660 


He 


Ser 


Lys 
670 


Thr 


Met 


Phe 


Ala 


Glu 
675 


Asn 


Leu 


Gly 
685 


Phe 


He 


He 


Thr 


Thr 
690 


Val 


Asp 


Glu 
700 


His 


Ala 


Thr 


Asp 


Glu 
705 


Gin 


Gin 


His 
715 


Thr 


Val 


Leu 


Gin 


Gly 
72 0 


Thr 


Leu 


Asn 

730 


Leu 


Thr 


Ala 


val 


Asn 
735 
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Glu 


Ala 


Val 


Leu 


He 


Glu 


Asn 


Leu 


Glu 


lie 


Pne 


Arg 


Lys 


Asn 


Qaiy 








740 










745 










750 


Phe 


Asp 


Phe 


Val 


He 


Asp 


Glu 


Asn 


Ala 


Pro 


Val 


Tnr 


Giu 


Arg 


Aia 








755 










760 










765 


Lys 


Leu 


lie 


Ser 


Leu 


Pro 


Thr 


Ser 


Lys 


Asn 


Trp 


Thr 


Pne 


Gly 


Pro 








770 










775 










^ r% f\ 

78 0 


Gin 


Asp 


val 


Asp 


Glu 


Leu 


He 


Phe 


Met 


Leu 


Ser 


Asp 


ser 


Pro 


Giy 






785 










790 










f- 

795 


Val 


Met 


Cys 


Arg 


Pro 


Ser 


Arg 


Val 


Lys 


Gili 


Met 


Pne 


Ala 


Ser 


Arg 






800 










805 










810 


Ala 


Cys 


Arg 


Lys 


Ser 


Val 


Met 


He 


Gly 


Thr 


Ala 


Leu 


Asn 


Tnr 


Ser 




815 










820 










825 


Glu 


Met 


Lys 


Lys 


Leu 


He 


Thr 


His 


Met 


Gly 


Glu 


Met 


Asp 


nis 


Pro 






830 










835 










o4 U 


Trp 


Asn 


Cys 


Pro 


His 


Giy 


Arg 


TJ 




Met" 






Tie 


Ala 


Asn 






845 










650 










855 


Leu 


Gly 


Val 


He 


Ser 


Gin 


Asn 



















860 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 0 BASE PAIRS 

(B) TYPE; NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 
{ D ) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 



GTTGAACATC TAGACGTCTC 
(2) INFORMATION FOR SEQ ID N0:8: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TVPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
TCGTGGCAGG GGTTATTCG 

(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNES S : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

CTACCCAATG CCTCAACCG 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTTERISTICS 

(A) LENGTH: 22 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 

GAGAACTGAT AGAAATTGGA TG 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 18 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 



( D ) TOPOLOGY : LINEAR 
(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 
GGGACATGAG GTTCTCCG 

(2) INFORMATION FOR SEQ ID N0:12: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

( D ) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 

GGGCTGTGTG AATCCTCAG 

{2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 13 

cggttc:acca ctgtctcgtc 

(2) information for seq id no: 14: 

(i) sequence characteristics 

(A) LENGTH: 18 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNES S : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLE(ZULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 

TCCAGGATGC TCTCCTCG 

(2) INFORMATION FOR SEQ ID NO: 15: 
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(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNKSS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 

CAAGTCCTGG TAGCAAAGTC 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleoticle 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 

ATGGCAAGGT CAAAGAGCG 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 22 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 

CAACAATGTA TTCAGNAAGT CC 

(2) INFORMATION FOR SEQ ID N0:18: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
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SUBSTITUTE SHEET (RUtE 26) 



TTGATACAAC ACTTTGTATC G 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 

GGAATACTAT CAGAAGGCAA G 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20 

ACAGAGCAAG TTACTCAGAT G 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 0 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

( D ) TOPOLOGY : L INE AR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 

GTACACAATG CAGGCATTAG 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 



(ii) MOLECULE TYPE: Oligonucleotide 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 
AATGTGGATG TTAATGTGCA C 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 

CTGACCTCGT CTTCCTAC 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 

CAGCAAGATG AGGAGATGC 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 

GGAAATGGTG GAAGATGATT C 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 16BASE PAIRS 
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(B) TYPE: NUCLEIC ACID 
{ C) STRANDEDNESS : SINGLE 
( D ) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 

CTTCTCAACA CCAAGC 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

( D ) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 

GAAATTGATG AGGAAGGGAA C 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 22 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
{ D ) TOPOLOGY : LINEAR 

(xi) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 

CTTCTGAITG ACAACTATGT GC 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 22 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS; SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 



CACAGAAGAT GGAAATATCC TG 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

( D ) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 
GTGTTGGTAG CACTTAAGAC 

(2) INFORMATION FOR SEQ ID NO: 31; 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 0 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

( D ) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE; Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 

TTTCCCATAT TCTTCACTTG 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 

GTAACATGAG CCACATGGC 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 



(ii) MOLECULE TYPE: Oligonucleotide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 



CCACTGTCTC GTCCAGCCG 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 26 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 

CGGGATCCAT GTCGTPCGTG GCAGGG 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 26 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 

GCTCTAGATT AACACCTCTC AAAGAC 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 

GCATCTAGAC GTTrCCTTGG C 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 0 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNES S : S INGLE 



(D) TOPOLOGY: LINEAR 
(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 7 
CATCCAAGCT TCTGTTCCCG 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS 
(A) LENGTH: 19 BASE PAIRS 
{B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 

GGGGTGCAGC AGCACATCG 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 0 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 

GGAGGCAGAA TGTGTGAGCG 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 

TCCCAAAGAA GGACTTGCT 

(2) INFORMATION FOR SEQ ID NO: 41: 
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(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 22 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41 

AGTATAAGTC TTAAGTGCTA CC 
(2) INFORMATION FOR SEQ ID N0:42: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 0 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

( D ) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41 

TTTATGGTTT CTCACCTGCC 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNES S : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 

GTTATCTGCC CACCTCAGC 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 5 9 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44 
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GGATCCTAAT ACGACTCACT ATAGGGAGAC CACCATGGCA TCTAGACGIT TCCCTTGGC 
(2) INFORMATION FOR SEQ ID N0:45: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:45: 

CATCCAAGCT TCTTGTTCCCG 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 56 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

( D ) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

GGATCCTAAT ACGACTCACT ATAGGGAGAC CACCATGGGG GTGCAGCAGC ACATCG 
(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

GGAGGCAGAA TGTGTGAGCG 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 8 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) 



MOLECULE TYPE : Oligonucleotide 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:48 
CGGGATCCAT GAAACAATTG CCTGCGGC 
(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 26 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:49 

GCTCTAGACC AGACTCATGC TGTTTT 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 26 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

( D ) TOPOLOGY : L INEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50 

CGGGATCCAT GGAGCGAGCT GAGAGC 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 23 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51 

GCTCTAGAGT GAAGACTCTG TCT 

(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 0 BASE PAIRS 



(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52 
AAGCTGCTCT GTTAAAAGCG 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 18 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

( D ) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53 

GCACCAGCAT CCAAGGAG 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54 

CAACCATGAG ACACATCGC 

(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 0 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

( D ) TOPOLOGY : L I NEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55 

AGGTTAGTGA AGACTCTGTC 
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(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 53 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

{ C ) STRANDEDNES S : S INGLE 
( D ) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

GGATCCTAAT ACGACTCACT ATAGGGAGAC CACCATGGAA CAATTGCCTG CGG 
(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 18 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
CCTGCTCCAC TCATCTGC 

(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 60 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

( D ) TOPOLOGY : L I NEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 

GGATCCTAAT ACGACTCACT ATAGGGAGAC CACCATGGAA GATATCTTAA AGTTAATCCG 
(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
GGCTTCTTCT ACTCTATATG G 
(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 58 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 

GC3ATCCTAAT ACGACTCACT ATAGGGAGAC CACCATGGCA GGTCTTGAAA ACTCTTCG 
(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

AAAACAAGTC AGTGAATCCT C 
(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 

AAGCACATCT GTTTCTGCTG 

(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 0 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
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(D) TOPOLOGY: LINEAR 



(ii) MOLECULE TYPE: Oligonucleotide 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 
ACGAGTAGAT TCCTTTAGGC 

(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

{ii} MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 

CAGAACTGAC ATGAGAGCC 
19 

(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

( D ) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 

GGATCCTAAT ACGACTCACT ATAGGGAGAC CACCATGGAG CGAGCTGAGA GC 
(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 0 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:66: 

AGGTTAGTGA AGACTCTGTC 



(2) 



INFORMATION FOR SEQ ID NO: 67: 



(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 17 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
CTGAGGTCTC AGCAGGC 

(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 57 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: 

GGATCCTAAT ACGACTCACT ATAGGGAGAC CACCATGGTG TCCATTTCCA GACTGCG 
(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 0 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 
{C) STRANDEDNESS: SINGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 

AGGTTAGTGA AGACTCTGTC 

(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 
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TTATTTGGCA GAAAAGCAGA G 
(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71 

TTAAAAGACT AACCTCTTGC C 
(2) INFORMATION FOR SEQ ID NO : 72 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

( D ) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72 
CTGCTGTTAT GAACAATATG G 
(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73 

CAGAAGCAGT TGCAAAGCC 

(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 0 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: Oligonucleotide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 74 
AAACCGTACT CTTCACACAC 
(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75 

GAGGAAAAGC TTTTGTTGGC 

(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 18 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76 

CAGTGGCTGC TGACTGAC 

(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77 

TCCAGAACCA AGAAGGAGC 

(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 16 BASE PAIRS 
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(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78 
TGAGGTCTCA GCAGGC 
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WHAT 15 CLAIMED IS : 

1. An isolated polynucleotide selected from the group 

consisting of : 

(a) a polynucleotide encoding a polypeptide having the 
deduced amino acid sequence of SEQ ID No. 2 or a fragment, analog 
or derivative of said polypeptide; 

(b) a polynucleotide encoding a polypeptide having the 
amino acid sequence encoded by the cDNA contained in ATCC Deposit 

No. 75649; 

(c) a polynucleotide encoding a polypeptide having the 
deduced amino acid sequence of SEQ ID No. 4 or a fragment, analog 
or derivative of said polypeptide; 

(d) a polynucleotide encoding a polypeptide having the 
amino acid sequence encoded by the cDNA contained in ATCC Deposit 
No. 75651; 

(e) a polynucleotide encoding a polypeptide having the 
deduced amino acid sequence of SEQ ID No. 6 or a fragment, analog 
or derivative of said polypeptide; and 

(f) a polynucleotide encoding a polypeptide having the 
amino acid sequence encoded by the cDNA contained in ATCC Deposit 
NO. 75650. 

2 . The polynucleotide of Claim 1 wherein the 
polynucleotide is DNA. 

3 . The polynucleotide of Claim 1 wherein the 
polynucleotide is RNA. 

4 . The polynucleotide of Claim 1 wherein the 
polynucleotide is genomic DNA. 

5 . The polynucleotide sequence of claim 1 for use in 
analyzing a sample for mutation of a polynucleotide sequence 
encoding a human mismatch repair protein comprising: 
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a polynucleotide sequence of at least 15 and no more 
than 3 0 consecutive bases of the polynucleotide sequence of ATTC 
Deposit No. 75649. 

6 . The polynucleotide sequence of claim l for use in 
analyzing a sample for mutation of a polynucleotide sequence 
encoding a human mismatch repair protein comprising: 

a polynucleotide sequence of at least 15 and no more 
than 3 0 consecutive bases of the the polynucleotide sequence of 

ATTC Deposit No. 75651. 

7. The polynucleotide sequence of claim l for use in 
analyzing a sample for mutation of a polynucleotide sequence 
encoding a human mismatch repair protein con^rising: 

a polynucleotide sequence of at least 15 and no more 
than 3 0 consecutive bases of the the polynucleotide sequence of 
ATTC Deposit No. 75650. 

8. The polynucleotide of Claim 2 wherein said 
polynucleotide encodes a polypeptide having the deduced amino 
acid sequence of SEQ ID No. 2. 

9 . The polynucleotide of Claim 2 wherein said 
polynucleotide encodes a polypeptide having the deduced amino 
acid sequence of SEQ ID No. 4. 

10. The polynucleotide of Claim 2 wherein said 
polynucleotide encodes a polypeptide having the deduced amino 
acid sequence of SEQ ID No. 6. 

11. The polynucleotide of Claim 2 wherein said 
polynucleotide encodes a polypeptide encoded by the cDNA of ATCC 
Deposit No. 75649. 
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12 . The polynucleotide of Claim 2 wherein said 
polynucleotide encodes a polypeptide encoded by the cDNA of ATCC 
Deposit No. 75651. 

13 . The polynucleotide of Claim 2 wherein said 
polynucleotide encodes a polypeptide encoded by the cDNA of ATCC 

Deposit No. 75650. 

14 . The polynucleotide of Claim 1 having the coding 
sequence of SEQ ID No. 1. 

15. The polynucleotide of Claim 1 having the coding 
sequence of SEQ ID No. 3. 

16. The polynucleotide of Claim 1 having the coding 
sequence of SEQ ID No. 5) . 

17. A vector containing the DNA of Claim 2. 

18 . A host cell genetically engineered with the vector of 
Claim 17. 

19. A process for producing a polypeptide con^rising: 
expressing from the host cell of Claim 18 the polypeptide encoded 
by said DNA. 

20. A process for producing cells capable of expressing a 
polypeptide comprising genetically engineering cells with the 
vector of Claim 17. 

21. An isolated DNA hybridizable to the DNA of Claim 2 and 
encoding a polypeptide having hMLHl activity. 



22 . An isolated DNA hybridizable to the DNA of Claim 2 and 

encoding a polypeptide having hMIjH2 activity. 
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23 . An isolated DNA hybridizable to the DNA of Claim 2 and 
encoding a polypeptide having hMLH3 activity. 

24 . A polypeptide selected from the group consisting of : 

(a) a polypeptide having the deduced amino acid 
sequence of SEQ ID No . 2 and fragments, analogs and derivatives 
thereof ; 

(b) a polypeptide encoded by the cDNA of ATCC Deposit 
No. 7564 9 and fragments, analogs and derivatives of said 
polypeptide; 

(c) a polypeptide having the deduced amino acid 
sequence of SEQ ID No. 4 and fragments, analogs and derivatives 
thereof ; 

(d) a polypeptide encoded by the cDNA of ATCC Deposit 
No. 75651 and fragments, analogs and derivatives of said 
polypeptide; 

(e) a polypeptide having the deduced amino acid 
sequence of SEQ ID No. 6 and fragments, analogs and derivatives 
thereof ; and 

(f ) a polypeptide encoded by the cDNA of ATCC Deposit 
No. 75650 and fragments, analogs and derivatives of said 
polypeptide . 

25. The polypeptide of Claim 15 wherein the polypeptide is 
hMLHl having the deduced amino acid sequence of SEQ ID No. 2. 

26. The polypeptide of Claim 14 wherein the polypeptide is 
hMLH2 having the deduced amino acid sequence of SEQ ID No. 4. 

27. The polypeptide of Claim 14 wherein the polypeptide is 
hMLH3 having the deduced amino acid sequence of SEQ ID No. 6, 

28. A process for diagnosing a susceptibility to cancer 
comprising: 
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determining from a sample derived from a human patient 
a mutation in a human mismatch repair gene, said human mismatch 
repair gene compriBing the polynucleotide sequence of claim 8 . 

29. A process for diagnosing a susceptibility to cancer 
comprising : 

determining from a sample derived from a human patient 
a mutation in a human mismatch repair gene, said human mismatch 
repair gene comprising the DNA of claim 9. 

30. A process for diagnosing a susceptibility to cancer 
comprising : 

determining from a sample derived from a human patient 
a mutation in a human mismatch repair gene, said human mismatch 
repair gene comprising the DNA of claim 10. 

31. A process for diagnosing a susceptibility to cancer 
comprising : 

determining from a sample derived from a human patient 
a mutation in a human DNA mismatch repair gene which encodes the 
human homolog of a bacterial mutL DNA mismatch repair gene. 
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